1

I have a dataframe that looks like the following:

Name    Value
abc     Asia
def     Asia/Africa
gbc     Africa
jhg     America/Africa/Asia

I want the column Value to be separated so that the dataframe looks as follows:

Name    Value.1    Value.2    Value.3
abc     Asia 
def     Asia       Africa
gbc     Africa
jhg     America    Africa    Asia

I know there are similar questions on stackoverflow (e.g. here, here ) but they assume that Value can be split into the same number of sections for each row. So, when I try to run the following commands:

out <- strsplit(as.character(df$Value),'/')
do.call(rbind, out)
data.frame(df$Value, do.call(rbind, out))

It works till the second line and then gives me the following error:

Error in data.frame(df$Value, do.call(rbind,  : 
  arguments imply differing number of rows: 24819, 24707
In addition: Warning message:
In (function (..., deparse.level = 1)  :
  number of columns of result is not a multiple of vector length (arg 10)

I also looked into reshape2 function colSplit and tidyr function separate. They also expect that for every row, Value can be seperated into exactly the same number of components, hence they expect me to name the columns in advance.

I am wondering of there is a way to adaptively name split the Value into separate columns called Value.1, Value.2,...

Community
  • 1
  • 1
DotPi
  • 3,977
  • 6
  • 33
  • 53
  • 1
    Your question is better since you include an example, but I guess this one also covers it: http://stackoverflow.com/q/26898833/1191259 – Frank Feb 23 '16 at 19:38

3 Answers3

4

We can use cSplit

library(splitstackshape)
cSplit(df1, 'Value', '/', 'wide')
#   Name Value_1 Value_2 Value_3
#1:  abc    Asia      NA      NA
#2:  def    Asia  Africa      NA
#3:  gbc  Africa      NA      NA
#4:  jhg America  Africa    Asia

In base R, we could simply do

cbind(df1[1],read.table(text=df1$Value, sep='/', fill=TRUE))
#  Name      V1     V2   V3
#1  abc    Asia            
#2  def    Asia Africa     
#3  gbc  Africa            
#4  jhg America Africa Asia
akrun
  • 874,273
  • 37
  • 540
  • 662
4

See tstrsplit in data.table which automatically fills out ragged splits (with NA by default, but can be controlled with the fill argument):

library(data.table)
setDT(df1)[ , tstrsplit(Value, split = "/")]

If you wanted to add these as columns to your data, it's a bit uglier to do dynamically

setDT(df1)[ , paste0("V", 1:max(sapply(spl <- strsplit(x, "/"), length))) := 
              transpose(spl)]
MichaelChirico
  • 33,841
  • 14
  • 113
  • 198
4

In base R:

df <- read.table(textConnection("Name    Value
abc     Asia
def     Asia/Africa
gbc     Africa
jhg     America/Africa/Asia"), header=TRUE)

out <- strsplit(as.character(df$Value),'/')
out <- lapply(out, FUN=function(x) c(unlist(x), rep(NA, max(lengths(out))-length(x))))
out <- as.data.frame( cbind(df$Value,do.call(rbind, out)))

  V1      V2     V3   V4
1  3    Asia   <NA> <NA>
2  4    Asia Africa <NA>
3  1  Africa   <NA> <NA>
4  2 America Africa Asia

Same approach, written a bit different:

out <- strsplit(as.character(df$Value),'/')
cbind(df[1], do.call(rbind, lapply(out, 'length<-', max(lengths(out)))))
  Name       1      2    3
1  abc    Asia   <NA> <NA>
2  def    Asia Africa <NA>
3  gbc  Africa   <NA> <NA>
4  jhg America Africa Asia
talat
  • 68,970
  • 21
  • 126
  • 157
HubertL
  • 19,246
  • 3
  • 32
  • 51