Split column to multiple columns adaptively in R

Question

I have a dataframe that looks like the following:

Name    Value
abc     Asia
def     Asia/Africa
gbc     Africa
jhg     America/Africa/Asia

I want the column Value to be separated so that the dataframe looks as follows:

Name    Value.1    Value.2    Value.3
abc     Asia 
def     Asia       Africa
gbc     Africa
jhg     America    Africa    Asia

I know there are similar questions on stackoverflow (e.g. here, here ) but they assume that Value can be split into the same number of sections for each row. So, when I try to run the following commands:

out <- strsplit(as.character(df$Value),'/')
do.call(rbind, out)
data.frame(df$Value, do.call(rbind, out))

It works till the second line and then gives me the following error:

Error in data.frame(df$Value, do.call(rbind,  : 
  arguments imply differing number of rows: 24819, 24707
In addition: Warning message:
In (function (..., deparse.level = 1)  :
  number of columns of result is not a multiple of vector length (arg 10)

I also looked into reshape2 function colSplit and tidyr function separate. They also expect that for every row, Value can be seperated into exactly the same number of components, hence they expect me to name the columns in advance.

I am wondering of there is a way to adaptively name split the Value into separate columns called Value.1, Value.2,...

Your question is better since you include an example, but I guess this one also covers it: http://stackoverflow.com/q/26898833/1191259 — Frank, Feb 23 '16 at 19:38

akrun · Answer 1 · 2016-02-23T20:20:25.460

We can use cSplit

library(splitstackshape)
cSplit(df1, 'Value', '/', 'wide')
#   Name Value_1 Value_2 Value_3
#1:  abc    Asia      NA      NA
#2:  def    Asia  Africa      NA
#3:  gbc  Africa      NA      NA
#4:  jhg America  Africa    Asia

In base R, we could simply do

cbind(df1[1],read.table(text=df1$Value, sep='/', fill=TRUE))
#  Name      V1     V2   V3
#1  abc    Asia            
#2  def    Asia Africa     
#3  gbc  Africa            
#4  jhg America Africa Asia

MichaelChirico · Answer 2 · 2016-02-23T20:09:04.440

4

See tstrsplit in data.table which automatically fills out ragged splits (with NA by default, but can be controlled with the fill argument):

library(data.table)
setDT(df1)[ , tstrsplit(Value, split = "/")]

If you wanted to add these as columns to your data, it's a bit uglier to do dynamically

setDT(df1)[ , paste0("V", 1:max(sapply(spl <- strsplit(x, "/"), length))) := 
              transpose(spl)]

edited Feb 23 '16 at 20:09

answered Feb 23 '16 at 19:34

MichaelChirico

33,841
14
113
198

score 4 · Accepted Answer · edited Feb 23 '16 at 20:01

4

In base R:

df <- read.table(textConnection("Name    Value
abc     Asia
def     Asia/Africa
gbc     Africa
jhg     America/Africa/Asia"), header=TRUE)

out <- strsplit(as.character(df$Value),'/')
out <- lapply(out, FUN=function(x) c(unlist(x), rep(NA, max(lengths(out))-length(x))))
out <- as.data.frame( cbind(df$Value,do.call(rbind, out)))

  V1      V2     V3   V4
1  3    Asia   <NA> <NA>
2  4    Asia Africa <NA>
3  1  Africa   <NA> <NA>
4  2 America Africa Asia

Same approach, written a bit different:

out <- strsplit(as.character(df$Value),'/')
cbind(df[1], do.call(rbind, lapply(out, 'length<-', max(lengths(out)))))
  Name       1      2    3
1  abc    Asia   <NA> <NA>
2  def    Asia Africa <NA>
3  gbc  Africa   <NA> <NA>
4  jhg America Africa Asia

edited Feb 23 '16 at 20:01

talat

68,970
21
126
157

answered Feb 23 '16 at 19:38

HubertL

19,246
3
32
51

You can use `max(lengths(out))` as max length after the `strsplit` – talat Feb 23 '16 at 19:44
Thanks @docendodiscimus – HubertL Feb 23 '16 at 19:45
1

Welcome. Another trick is to use `lapply(out, 'length<-', max(lengths(out)))` which will populate shorter vectors with NAs – talat Feb 23 '16 at 19:47
@docendodiscimus Great! But how can I write ` on SO? – HubertL Feb 23 '16 at 19:53
Do you mean inside comments? I don't know that but in normal posts it's not a problem. And my comment also works with normal ' – talat Feb 23 '16 at 19:55
@HubertL are you trying insert docendo's code in your answer? – Pierre L Feb 23 '16 at 19:58
Hubert, I made an edit to your answer. Feel free to rollback or re-edit of course – talat Feb 23 '16 at 20:02
2

Fyi, backticks in comment code work if you escape them with a backslash @docendo – Frank Feb 23 '16 at 20:04
1

Thanks @Frank, that's nice to know – talat Feb 23 '16 at 20:05

Split column to multiple columns adaptively in R

3 Answers3