R: Can't split a string correctly using lookahead or lookbehind strsplit

Question

Here's the string

15  3 23 11  0 51.0000000  0 18G 5G 7G 9G10G13G16G19G20G27G28G30R 2

I need to split it by "G" and "R" to get

[1] " 15  3 23 11  0 18.0000000  0 18 "G 5" "G 7" "G 9" "G10" "G13" .... "R 2"

I'm trying to use lookahead and lookbehind.

Lookbehind ss.tl.pattern="(?<=G|R[ 0-9]{2})" split.tl=strsplit(time.lines,ss.tl.pattern,perl=TRUE) works reasonably:

[[1]]
 [1] " 15  3 23 11  0 18.0000000  0 18G 5" "G 7"                                
 [3] "G 9"                                 "G10"                                
 [5] "G13"                                  "G16"                                
 [7] "G19"                                 "G20"                                
 [9] "G27"                                 "G28"                                
 [11] "G30"                                 "R 2"

everything except first sep as expected

If I try lookahead for same pattern ss.tl.pattern="(?=G|R[ 0-9]{2})" it goes wrong:

 [[3]]
 [1] " 15  3 23 11  0 20.0000000  0 18" "G"                               
 [3] " 5"                               "G"                               
 [5] " 7"                               "G"                               
 [7] " 9"                               "G"                               
 [9] "10"                               "G"                               
[11] "13"                               "G"                               
[13] "16"                               "G"                               
[15] "19"                               "G"                               
[17] "20"                               "G"                               
[19] "27"                               "G"                               
[21] "28"                               "G"                               
[23] "30"                               "R"
[25] "2"

I can't figure out why it splits both before and after "G" or "R".

I will probably try a workaround using `regmatches` or something like that, but I still want to know why it works like that — ephemeris, Jan 18 '16 at 12:56
Do you need `scan(text=gsub("(?<=\\d)(?=(G|R))", ",", str1, perl=TRUE), sep=",", what="")# [1] "15 3 23 11 0 51.0000000 0 18" "G 5" "G 7" "G 9" [5] "G10" "G13" "G16" "G19" [9] "G20" "G27" "G28" "G30" [13] "R 2"` or the same — akrun, Jan 18 '16 at 13:00
Or using `strsplit` i.e. `strsplit(str1, "(?<=\\d)(?=(G|R))", perl=TRUE)[[1]]` — akrun, Jan 18 '16 at 13:04
I think the explanation is [here](http://stackoverflow.com/questions/15575221/why-does-strsplit-use-positive-lookahead-and-lookbehind-assertion-matches-differ). Looks like it is ByDesign. — Wiktor Stribiżew, Jan 18 '16 at 13:15

score 3 · Accepted Answer · answered Jan 18 '16 at 13:05

We can use strsplit

strsplit(str1, "(?<=\\d)(?=(G|R))", perl=TRUE)[[1]]
#[1] "15  3 23 11  0 51.0000000  0 18" "G 5"                             "G 7"                             "G 9"                            
#[5] "G10"                             "G13"                             "G16"                             "G19"                            
#[9] "G20"                             "G27"                             "G28"                             "G30"                            
#[13] "R 2"

data

str1 <- "15  3 23 11  0 51.0000000  0 18G 5G 7G 9G10G13G16G19G20G27G28G30R 2"

R: Can't split a string correctly using lookahead or lookbehind strsplit

1 Answers1

data