77

I know I've come across this problem before, but I'm having a bit of a mental block at the moment. and as I can't find it on SO, I'll post it here so I can find it next time.

I have a dataframe that contains a field representing an ID label. This label has two parts, an alpha prefix and a numeric suffix. I want to split it apart and create two new fields with these values in.

structure(list(lab = c("N00", "N01", "N02", "B00", "B01", "B02", 
"Z21", "BA01", "NA03")), .Names = "lab", row.names = c(NA, -9L
), class = "data.frame")

df$pre<-strsplit(df$lab, "[0-9]+")
df$suf<-strsplit(df$lab, "[A-Z]+")

Which gives

   lab pre  suf
1  N00   N , 00
2  N01   N , 01
3  N02   N , 02
4  B00   B , 00
5  B01   B , 01
6  B02   B , 02
7  Z21   Z , 21
8 BA01  BA , 01
9 NA03  NA , 03

So, the first strsplit works fine, but the second gives a list, each having two elements, an empty string and the result I want, and stuffs them both into the dataframe column.

How can I select the second sub-element from each element of the list ? (or, is there a better way to do this)

Richard Erickson
  • 2,568
  • 8
  • 26
  • 39
PaulHurleyuk
  • 8,009
  • 15
  • 54
  • 78

3 Answers3

128

To select the second element of each list item:

R> sapply(df$suf, "[[", 2)
[1] "00" "01" "02" "00" "01" "02" "21" "01" "03"

An alternative approach using regular expressions:

df$pre <- sub("^([A-Z]+)[0-9]+", "\\1", df$lab)
df$suf <- sub("^[A-Z]+([0-9]+)", "\\1", df$lab)
rcs
  • 67,191
  • 22
  • 172
  • 153
13

with purrr::map this would be

df$suf %>%  map_chr(c(2)) 

for further info on purrr::map

Uwe Sterr
  • 131
  • 1
  • 5
4

First of all: if you use str(df) you'll see that df$pre is list. I think you want vector (but I might be wrong).
Return to problem - in this case I will use gsub:

df$pre <- gsub("[0-9]", "", df$lab)
df$suf <- gsub("[A-Z]", "", df$lab)

This guarantee that both columns are vectors, but it fail if your label is not from key (i.e. 'AB01B').

Marek
  • 49,472
  • 15
  • 99
  • 121
  • RCS's answer actually answer my main question (how to returnt he second value from the list) but your answer seems to be more elegant for what I actually want. Well done. – PaulHurleyuk May 10 '10 at 15:24
  • 1
    Could you please explain how the "[[" worked in sapply. The definition for sapply is stated at : http://www.inside-r.org/r-doc/base/sapply. – andor kesselman Sep 21 '15 at 07:33