-2

I tried to use strsplit with regex to split a long string on the space before the date. But I couldn't figure out how to work out the regex on the format such as 02 November 2020, 31 October 2020. Anyone knows how to format the regex part?

abc <- "02 November 2020 Staffline - BD5 8LZ £8.72 to £13.4 per hour 02 November 2020 University of Bradford - Bradford, West Yorkshire £20,130 to £21,814 per year Fixed term 12 months 02 November 2020 Anlaby Window Cleaning Services Limited - Bradford, West Yorkshire 01 November 2020 Household - Bradford, West Yorkshire £8.72 per hour 01 November 2020 Affinity Trust - Shipley, BD10 £8.72 per hour 01 November 2020 Co-op Group - Bingley, West Yorkshire, BD13 5DD £9.00 per hour 31 October 2020 UKWC - BD1 4PS £8.72 to £10.72 per hour"

Expected output:

"02 November 2020 Staffline - BD5 8LZ £8.72 to £13.4 per hour" 
"02 November 2020 University of Bradford - Bradford, West Yorkshire £20,130 to £21,814 per year Fixed term 12 months"
"02 November 2020 Anlaby Window Cleaning Services Limited - Bradford, West Yorkshire" 
"01 November 2020 Household - Bradford, West Yorkshire £8.72 per hour" 

"01 November 2020 Affinity Trust - Shipley, BD10 £8.72 per hour" "01 November 2020 Co-op Group - Bingley, West Yorkshire, BD13 5DD £9.00 per hour " "31 October 2020 UKWC - BD1 4PS £8.72 to £10.72 per hour"

codedancer
  • 1,504
  • 9
  • 20
  • The error is a very well-known one, just use ``\\`` instead of a single ``\``. Also, your expected output is not clear: `"01 November 2020 Household - Bradford, West Yorkshire £8.72 per hour 01 November 2020 Affinity Trust - Shipley, BD10 £8.72 per hour"` should be actually split into `"01 November 2020 Household - Bradford, West Yorkshire £8.72 per hour` and `01 November 2020 Affinity Trust - Shipley, BD10 £8.72 per hour"`, shouldn't it? – Wiktor Stribiżew Nov 02 '20 at 11:58
  • 1
    You need to include `[A-Z]` to obtain the correct result: `abc2 <- strsplit(abc, "\\s(?=[0-9]{2}\\s[A-Z])", perl = T)` – Chris Ruehlemann Nov 02 '20 at 12:02
  • Yes, @WiktorStribiżew. That's my typo mistake. I think I should have left out the `\\` error experience because `[\\s](?=[0-9]{2}\\s)` would split on the "12 months" as well. The ideal approach is to find a regex that I can do `31 October 2020`. – codedancer Nov 02 '20 at 12:06

1 Answers1

-2
abc2 <- strsplit(abc, "\\s(?=[0-9]{2}\\s[A-Z])", perl = T)
Chris Ruehlemann
  • 20,321
  • 4
  • 12
  • 34