text to column, do not repeat column name

Question

I have a df that looks as follows:

id  name                 grade
1   rich, tom, todd,     12
2   chris,mary           9
3   larry                10

I run the following code to split text to column:

newdf <- within(df, name<-data.frame(do.call('rbind', strsplit(as.character(name), ',', fixed=TRUE))))

And here is my output:

id  name.X1   name.X2   name.X3    grade
1   rich       tom       todd       12
2   chris      mary      chris      9
3   larry      larry     larry      10

The code I have is repeating names(in id 2 & 3), as opposed to putting in blanks or NA. What I'd like the code to output is the following:

  id    name.X1   name.X2   name.X3    grade
  1     rich       tom       todd       12
  2     chris      mary      N/A        9
  3     larry      N/A       N/A        10

Or instead of N/A I'd like for the cells to be left blank. Any idea how I can avoid having it repeat names? Thank you.

akrun · Accepted Answer · 2016-07-24T14:40:12.677

2

We can use cSplit from splitstackshape

library(splitstackshape)
cSplit(df, "name", ",")
#   id grade name_1 name_2 name_3
#1:  1    12   rich    tom   todd
#2:  2     9  chris   mary     NA
#3:  3    10  larry     NA     NA

If we are using strsplit, as the list elements are of unequal length, it may be better to pad with NAs or else the elements get repeated. For padding with NA at the end, one option is to get the length of each of the list element which can be done with lengths, take the max ('mx') and assign the length to 'mx'. Then, we just create new columns on 'df' based on 'mx'.

lst <- strsplit(as.character(df$name), ",\\s*")
mx <- max(lengths(lst))
df[paste0("name", seq(mx))] <- lapply(lst, `length<-`, mx)
df[setdiff(names(df), "name")]
#  id grade name1 name2 name3
#1  1    12  rich chris larry
#2  2     9   tom  mary  <NA>
#3  3    10  todd  <NA>  <NA>

edited Jul 24 '16 at 14:40

answered Jul 24 '16 at 12:20

akrun

874,273
37
540
662

@ZheyuanLi `name_3: Factor w/ 1 level "todd": 1 NA NA` – akrun Jul 24 '16 at 12:32
1

@ZheyuanLi Yes, because the previous case, the columns are returned as `factor` as the initial 'name' column, but in the `strsplit`, it will be `character` columns. – akrun Jul 24 '16 at 12:33
@ZheyuanLi I guess for the `factor`, the NA might be `NA_integer_` – akrun Jul 24 '16 at 12:35
1

works perfectly, thank you @akrun – richiepop2 Jul 24 '16 at 14:39
for csplit (cSplit(df, "name", ",") can you split multiple columns with one line of code. I tried several variations, but nothing worked for cSplit. I tried something like this: cSplit(df, "name", "subject", ",") but it didn't work. – richiepop2 Jul 25 '16 at 01:50
@richiepop2 For that you need to concatenate i.e. `cSplit(df, c("name", "subject"), ",")` – akrun Jul 25 '16 at 03:12
1

ah, got it. Thanks!!! – richiepop2 Jul 25 '16 at 03:39

text to column, do not repeat column name

1 Answers1