2

I'm trying to split a column into multiple columns with '_' as the delimiter. I've tried the solutions here:

Splitting a dataframe string column into multiple different columns

In my case, the problem seems to be that the number of 'delimited' text segments varies across rows (ranges from 3 to 5) because I continually get errors whenever I try split the strings and then convert to a matrix or dataframe. Here is an example you can copy into R:

example<-structure(list(c1 = c(1, 2, 3), temp2 = structure(c(3L, 1L, 2L
), .Label = c("TR117_1119_HI_33", "TR504_1115_Su1_Y_4", "TR9_0817_2"
), class = "factor")), .Names = c("c1", "temp2"), row.names = c(NA, 
-3L), class = "data.frame")

For this example, ultimately I would like the 'temp2' column to be parced at each '_' to produce the following:

> Output
   new1 new2 new3 new4 new5
1   TR9 0817    2   NA   NA
2 TR117 1119   HI   33   NA
3 TR504 1115  Su1    Y    4

Here is as far as I have gotten without running into an error:

example$temp3<-as.character(example$temp2)
test<-strsplit(example$temp3,"_")

Do I need to use a for loop to do this row by row or is there a more elegant solution?

Community
  • 1
  • 1
user2860703
  • 483
  • 5
  • 18

2 Answers2

2

We could also use tstrsplit from library(data.table)

library(data.table)#v1.9.6+
setnames(setDT(example)[, tstrsplit(temp2, '_')], paste0('new', 1:5))[]
#    new1 new2 new3 new4 new5
#1:   TR9 0817    2   NA   NA
#2: TR117 1119   HI   33   NA
#3: TR504 1115  Su1    Y    4

NOTE: It also has type.convert argument which can convert to appropriate types.


Or cSplit from library(splitstackshape)

library(splitstackshape)
cSplit(example, 'temp2', '_')[,c1:=NULL][]

and change the column names with setnames as above.


Or using read.table from base R

read.table(text=as.character(example$temp2), sep="_", na.strings='',
                stringsAsFactors=FALSE,fill=TRUE, col.names=paste0('new', 1:5))
#   new1 new2 new3 new4 new5
#1   TR9  817    2 <NA>   NA
#2 TR117 1119   HI   33   NA
#3 TR504 1115  Su1    Y    4
akrun
  • 874,273
  • 37
  • 540
  • 662
1

You could use tidyr's separate()

tidyr::separate(example[-1], temp2, paste0("new", 1:5), "_", fill = "right")
#    new1 new2 new3 new4 new5
# 1   TR9 0817    2 <NA> <NA>
# 2 TR117 1119   HI   33 <NA>
# 3 TR504 1115  Su1    Y    4

Add convert = TRUE if you wish for the new columns to be converted to their appropriate types.

Rich Scriven
  • 97,041
  • 11
  • 181
  • 245