15

I'm not new to R but I am relatively new to regular expressions.

A similar question can be found in here, but it asks to split on the first comma rather than the last one.

As an example, if I use

> lastcomma_strsplit("UK, USA, Germany", ", ")
[[1]]
[1] "UK"      "USA"     "Germany"

I want to get

[[1]]
[1] "UK, USA"     "Germany"

And if I use

> lastcomma_strsplit("London, Washington, D.C., Berlin", ", ")
[[1]]
[1] "London"     "Washington" "D.C."       "Berlin"  

I want to get

[[1]]
[1] "London, Washington, D.C."       "Berlin"  

One viable way I think is to replace the last comma by something else such as

$, #, *, ...

then use

strsplit() 

to split the string by the one you replaced (Make sure it is unique!), but I'm more happy if you can deal with the problem using some built in function directly.

So how can I do that?

VLAZ
  • 26,331
  • 9
  • 49
  • 67
Jiqing Huang
  • 171
  • 1
  • 1
  • 8

2 Answers2

20

Here's one approach:

strsplit("UK, USA, Germany", ",(?=[^,]+$)", perl=TRUE)

## [[1]]
## [1] "UK, USA" " Germany"

You may want:

strsplit("UK, USA, Germany", ",\\s*(?=[^,]+$)", perl=TRUE)

## [[1]]
## [1] "UK, USA" "Germany"

As it will match if there is no space after the comma:

strsplit(c("UK, USA, Germany", "UK, USA,Germany"), ",\\s*(?=[^,]+$)", perl=TRUE)

## [[1]]
## [1] "UK, USA" "Germany"
## 
## [[2]]
## [1] "UK, USA" "Germany"
Tyler Rinker
  • 108,132
  • 65
  • 322
  • 519
8

You can use stri_split function from stringi package

x <- "USA,UK,Poland"
stri_split_fixed(x,",") # standard split by comma
[[1]]
[1] "USA"    "UK"     "Poland"

stri_split_fixed(x,",",n = 2) # set the max number of elements
[[1]]
[1] "USA"       "UK,Poland"

Unfortunately there is no parameter to change the starting point for splitting (from begin/end) but we can handle this another way - using stri_reverse

stri_split_fixed(stri_reverse(x),",",n = 2) #reverse
[[1]]
[1] "dnaloP" "KU,ASU"

stri_reverse(stri_split_fixed(stri_reverse(x),",",n = 2)[[1]]) #reverse back
[1] "Poland" "USA,UK"
stri_reverse(stri_split_fixed(stri_reverse(x),",",n = 2)[[1]])[2:1] #and again :)
[1] "USA,UK" "Poland"
David Arenburg
  • 91,361
  • 17
  • 137
  • 196
bartektartanus
  • 15,284
  • 6
  • 74
  • 102