Dataframe with string columns - each column need to split into multiple at word "and" - R

Question

I have a dataframe that has string columns - each of these columns is of format "xyz:x-dffh, dddd and stgL-fhgdf,"

I need to split at the word "and" - rest should be as is

Input is a dataframe with 2 such columns - output will be for each column in input multiple output columns

Is this doable in R? In excel I would use text to columns -

Welcome to SO. Please provide a [reproducible examples](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) as it makes answering your question a lot easier. — geotheory, Jul 26 '13 at 08:45
You want to use `strsplit`. More detailed answers will require you to supply `dput(head(input))` where `input` is your dataframe. — Thomas, Jul 26 '13 at 08:49

bmartinez · Answer 1 · 2013-07-26T09:43:19.243

2

If 'df' is your dataframe, you can try creating two new columns from the original column you want to split adapting the following code to your data:

df$newColumn1 <- lapply(strsplit(as.character(df$originalColumn), "and"), "[", 1)
df$newColumn2 <- lapply(strsplit(as.character(df$originalColumn), "and"), "[", 2)

edited Jul 26 '13 at 09:43

answered Jul 26 '13 at 09:21

bmartinez

31
4

1

I don't think it is a good idea to assign a list to a data.frame column. – Roland Jul 26 '13 at 09:36
@Roland, just curious--why not? I agree it's not the most convenient data format to work with, but some of base R's functions do so in common operations (like `aggregate`, on occasion). – A5C1D2H2I1M1N2O1R2T1 Jul 26 '13 at 11:04
The main reason is that it leads to an uncommon data structure, which can make code confusing. – Roland Jul 26 '13 at 11:09

score 1 · Answer 2 · answered Jul 26 '13 at 12:09

You could try the following in base R (similar to bmartinez'z answer without the assignment of list to dataframe):

df <- data.frame(originalColumn = c("dog and cat", "robots and raptors"))

do.call(rbind.data.frame, strsplit(as.character(df$originalColumn), "and"))

## > do.call(rbind.data.frame, strsplit(as.character(df$originalColumn), "and"))
##   c..dog.....robots... c...cat.....raptors..
## 1                 dog                    cat
## 2              robots                raptors

Or using the qdap package:

library(qdap)
colsplit2df(df, sep = "and")


## > colsplit2df(df, sep = "and")
##        X1       X2
## 1    dog       cat
## 2 robots   raptors

score 0 · Accepted Answer · answered Jul 30 '13 at 04:39

Here is what worked for me - using inputs from above and various other threads on SO. I am a complete newbie to R and my objective is to migrate work from excel to R.

# returns string w/o leading or trailing whitespace
trim <- function (x) gsub("^\\s+|\\s+$", "", x)

#--------------------------------------------------------------------------------
# OBJECTIVE - migrate this activity from excel + VBA to R
#
# split and find out max cols required - each element in dataframe is a list of
#variable length - objective is to convert it into individual columns with number of 
#columns = maximum size of list - for the rows with less number of entries the
#additional columns will contain "NA"
---------------------------------------------------------------------------------

temp_split<-strsplit(src.df$PREV,"and")
max_col=max(unlist(lapply(temp_split,length),recursive=TRUE))

# add to dataframe with fixed row and max_col
# keep columns empty - if no data

add_list <- function (x,max_col){
u_l <- unlist(x)
l<-length(unlist(x))
pad_col = max_col - l
r_l <- c(u_l, rep("NA",pad_col))
return(r_l)
}

test<-lapply(temp_split,add_list,max_col)
test_matrix<-data.frame(matrix(unlist(test,recursive=TRUE),nrow=NROW(src.df),byrow=T))

t.df<-test_matrix
c.df<-cbind(src.df,t.df)

score 0 · Answer 4 · answered Jul 25 '14 at 22:20

0

This is a slight modification on the excellent answer provided by Tyler Rinker to solve a nearly identical problem. What if you wanted to separate the df into columns based on a space (similar to text to columns in excel)?

Try this:
df <- data.frame(originalColumn = c("dog and cat", "robots and raptors")) dfSpace<-do.call(rbind.data.frame, strsplit(as.character(df[,1]), " ")) dfSpace

make sure you and a space between the quotation marks.

answered Jul 25 '14 at 22:20

feldhauj

1
1

that is indeed nearly identical... and not an answer to the question... and not valid because you've put `dfSpace` randomly at the end of the 2nd line... and not well formatted... – Hack-R Feb 12 '16 at 01:51

Dataframe with string columns - each column need to split into multiple at word "and" - R

4 Answers4

Linked