Convert basket to single

Question

I currently have a table set up in a basket format so that an irregular amount of data is associated with each row of the table. Such as:

01,item1,item2,item3
02,item1,item2,
03,item1,item2,item3,item4
04,item1

However, I need to change it to a normalized transactional format with only one item on each row. Such as:

01,item1
01,item2
01,item3
02,item1
02,item2
03,item1

...and so on. Is there an easy automated or programmatic way to do this? The data is currently in a MySQL database that I can export in a variety of file types, and I also have access to RStudio, and Microsoft Excel to try to do this. All the transactional resources I could find for RStudio assume that the data was already in the second format, which is what I'm trying to get to.

Have you searched SO for this? Pretty sure this is a duplicate. http://stackoverflow.com/questions/13773770/split-comma-separated-column-into-separate-rows and http://stackoverflow.com/questions/8464312/r-converting-comma-separated-entry-to-columns and http://stackoverflow.com/questions/5413842/extracting-values-after-pattern/5426837#5426837 AND I will bet a bunch more. — IRTFM, Feb 11 '13 at 20:15
All you need to do is paste( . , . sep=",") to the results of those operations in the other citations. — IRTFM, Feb 11 '13 at 20:23
If neither of those questions describe your problem, then I don't think you've fully described the issue you're facing. Can you provide a specific example data frame like the one you're trying to reshape? — joran, Feb 11 '13 at 20:25

score 1 · Accepted Answer · answered Feb 11 '13 at 20:35

I am assuming I understand the way your data set will look once you read it into R, i.e., it will be a rectangular data frame where NA's are filled in to make the rows the same length. So this should solve the problem:

#Create your dataset (this step is not for you)
row1 = c("01","item1","item2","item3",NA)
row2 = c("02","item1","item2",NA,NA)
row3 = c("03","item1","item2","item3","item4")
row4 = c("04","item1",NA,NA,NA)

Data = rbind(row1,row2,row3,row4)

#Now do the reconstruction (this step is for you)
col1 = NULL
col2 = NULL

for(i in 1:nrow(Data)){
    col1 = c(col1,rep(Data[i],ncol(Data)-1))
    col2 = c(col2,Data[i,-1])
}

NewData = cbind(col1,col2)[!is.na(col2),]

So, what you get is the following

> Data
     [,1] [,2]    [,3]    [,4]    [,5]   
row1 "01" "item1" "item2" "item3" NA     
row2 "02" "item1" "item2" NA      NA     
row3 "03" "item1" "item2" "item3" "item4"
row4 "04" "item1" NA      NA      NA     
> 
> NewData
      col1 col2   
 [1,] "01" "item1"
 [2,] "01" "item2"
 [3,] "01" "item3"
 [4,] "02" "item1"
 [5,] "02" "item2"
 [6,] "03" "item1"
 [7,] "03" "item2"
 [8,] "03" "item3"
 [9,] "03" "item4"
[10,] "04" "item1"

So hopefully that helps.

score 0 · Answer 2 · edited May 23 '17 at 12:12

This question is really similar to this one. As @DWin mention in his comment you need to apply paste( . , . sep=",") to a splitted list.

xx <- read.table(text ='01,item1,item2,item3
02,item1,item2,
03,item1,item2,item3,item4
04,item1')

unlist(lapply(strsplit(as.character(xx$V1), ','), 
       function(x) paste(x[1],x[-1],sep=',')))

[1] "01,item1" "01,item2" "01,item3" "02,item1" "02,item2" "03,item1" "03,item2" "03,item3" "03,item4" "04,item1"

Convert basket to single

2 Answers2

Linked