-1

I have a df that looks as follows:

id  name                 grade
1   rich, tom, todd,     12
2   chris,mary           9
3   larry                10

I run the following code to split text to column:

newdf <- within(df, name<-data.frame(do.call('rbind', strsplit(as.character(name), ',', fixed=TRUE))))

And here is my output:

id  name.X1   name.X2   name.X3    grade
1   rich       tom       todd       12
2   chris      mary      chris      9
3   larry      larry     larry      10

The code I have is repeating names(in id 2 & 3), as opposed to putting in blanks or NA. What I'd like the code to output is the following:

  id    name.X1   name.X2   name.X3    grade
  1     rich       tom       todd       12
  2     chris      mary      N/A        9
  3     larry      N/A       N/A        10

Or instead of N/A I'd like for the cells to be left blank. Any idea how I can avoid having it repeat names? Thank you.

richiepop2
  • 348
  • 1
  • 12

1 Answers1

2

We can use cSplit from splitstackshape

library(splitstackshape)
cSplit(df, "name", ",")
#   id grade name_1 name_2 name_3
#1:  1    12   rich    tom   todd
#2:  2     9  chris   mary     NA
#3:  3    10  larry     NA     NA

If we are using strsplit, as the list elements are of unequal length, it may be better to pad with NAs or else the elements get repeated. For padding with NA at the end, one option is to get the length of each of the list element which can be done with lengths, take the max ('mx') and assign the length to 'mx'. Then, we just create new columns on 'df' based on 'mx'.

lst <- strsplit(as.character(df$name), ",\\s*")
mx <- max(lengths(lst))
df[paste0("name", seq(mx))] <- lapply(lst, `length<-`, mx)
df[setdiff(names(df), "name")]
#  id grade name1 name2 name3
#1  1    12  rich chris larry
#2  2     9   tom  mary  <NA>
#3  3    10  todd  <NA>  <NA>
akrun
  • 874,273
  • 37
  • 540
  • 662