0

I have a dataframe as such:

Partner    Item
      A      ab
      A      ac
      A      ad
      A      ed
      B      ol
      B      le
      C      ef
      E      ab
      E      ol
      E      ef
      E      at
      E      ok

I want to convert this to:

Partner    Col1    Col2    Col3     Col4    Col5    
      A      ab      ac      ad       ed
      B      ol      le
      C      ef
      E      ab      ol      ef       at      ok

For some context, I am going to be using the arules package to convert my dataframe to transactions class to call the apriori algorithm.

The way I want to do this is convert the original dataframe as such. Save it as a separate file, and then call it again using read.transactions.

Any help would be great, thanks!

nak5120
  • 4,089
  • 4
  • 35
  • 94

1 Answers1

1

You will want to use dcast for this. If you have a large dataset check out dcast in data.table. Otherwise, the one in reshape2 will work just fine.

library(reshape2)
df2 = dcast(df,  Partner ~ Item, value.var = "Item")

This will give us

  Partner   ab   ac   ad   at   ed   ef   le   ok   ol
1       A   ab   ac   ad <NA>   ed <NA> <NA> <NA> <NA>
2       B <NA> <NA> <NA> <NA> <NA> <NA>   le <NA>   ol
3       C <NA> <NA> <NA> <NA> <NA>   ef <NA> <NA> <NA>
4       E   ab <NA> <NA>   at <NA>   ef <NA>   ok   ol

Then, we just need to set the column names and fill in the NAs with ""

colnames = paste("Col",1:length(unique(df$Item)), sep = "")
colnames(df2) = c("Partner",paste("Col",1:length(unique(df$Item)), sep = ""))
df2[is.na(df2)] <- ""

  Partner Col1 Col2 Col3 Col4 Col5 Col6 Col7 Col8 Col9
1       A   ab   ac   ad        ed                    
2       B                                 le        ol
3       C                            ef               
4       E   ab             at        ef        ok   ol

To sort you could do something like this

tmp = df2[, 2:ncol(df2)]
tmp = t(apply(tmp, 1, sort, decreasing = TRUE))

df3 = cbind.data.frame(df2[,1],tmp)
colnames(df3) = c("Partner",paste("Col",1:length(unique(df$Item)), sep = ""))

> df3

  Partner Col1 Col2 Col3 Col4 Col5 Col6 Col7 Col8 Col9
1       A   ed   ad   ac   ab                         
2       B   ol   le                                   
3       C   ef                                        
4       E   ol   ok   ef   at   ab      

There is probably a more efficient way of doing that. Apply turns the DF into a matrix to sort. I'm not sure how to do this without using that though.

Kristofersen
  • 2,736
  • 1
  • 15
  • 31
  • this is great @Kristofersen. How would you sort it so that everything reads left to right rather than having spaces on the left side? So the final output looks like my final output – nak5120 Mar 21 '17 at 15:06
  • @NickKnauer I added a method to sort. There is probably a better way but this is what I could come up with. – Kristofersen Mar 21 '17 at 15:55