1

This is the string I want to split:

b[1] [1] "County January 2016 February 2016 March 2016 April 2016 May 2016 June 2016 July 2016 August 2016 September 2016 October 2016 November 2016 December 2016\r"

From this post split string with regex I gather there is no ready-made function to do so, I just want to confirm than.

Here is my code

split.pos <- gregexpr("County|([aA-zZ]{1,} [0-9]{4,})", b[1], perl = FALSE)

split.length <- attr(split.pos[[1]], "match.length")

split.start <- split.pos[[1]][1:length(split.pos[[1]])]

substring(b[1], split.start, split.start+split.length)
 [1] "County "         "January 2016 "   "February 2016 "  "March 2016 "    
 [5] "April 2016 "     "May 2016 "       "June 2016 "      "July 2016 "     
 [9] "August 2016 "    "September 2016 " "October 2016 "   "November 2016 " 
[13] "December 2016\r

Is there a better way of doing this? Thanks

Community
  • 1
  • 1
Bhail
  • 385
  • 1
  • 2
  • 18
  • 2
    What is the splitting criteria? Is your data tab-delimited? Fixed-width? Always non-numeric, non-spaced value followed optionally by a number? What is your criteria for "better"? – MrFlick Feb 14 '17 at 20:34
  • Thanks for urging me to think properly about this operation. – Bhail Feb 14 '17 at 20:56

1 Answers1

1

We can use strsplit with regex lookaround

strsplit(b, "(?<=[0-9])\\s+|\\s+(?=[A-Z])", perl = TRUE)[[1]]
#[1] "County"         "January 2016"   "February 2016"  "March 2016"     "April 2016"     "May 2016"       "June 2016"      "July 2016"      "August 2016"   
#[10] "September 2016" "October 2016"   "November 2016"  "December 2016" 

data

b <- "County                           January 2016 February 2016         March 2016         April 2016       May 2016           June 2016         July 2016      August 2016 September 2016 October 2016             November 2016 December 2016\r"
akrun
  • 874,273
  • 37
  • 540
  • 662