Remove column labels from a transaction object

Question

I have a data frame df like below:

df <- data.frame(V1 = c("Prod1", "Prod2", "Prod3"),
                 V2 = c("Prod3", "Prod1", "Prod2"), 
                 V3 = c("Prod2", "Prod1", "Prod3"), 
                 City = c("City1", "City2", "City3"))

When I convert this to transaction class, using the code:

tData <- as(df, "transactions")
inspect(tData)

I get a result like below:

    items                                   transactionID
[1] {V1=Prod1,V2=Prod3,V3=Prod2,City=City1} 1            
[2] {V1=Prod2,V2=Prod1,V3=Prod1,City=City2} 2            
[3] {V1=Prod3,V2=Prod2,V3=Prod3,City=City3} 3

This means that I have V1=Prod1 and V2=Prod1 as separate products when they are actually the same. This is giving me problems when I use this for apriori algorithm.

How can I remove the column labels so that I get the transaction object as:

    items                                   transactionID
[1] {Prod1,Prod3,Prod2,City1} 1            
[2] {Prod2,Prod1,Prod1,City2} 2            
[3] {Prod3,Prod2,Prod3,City3} 3

Please help.

How do you "convert this to transaction class" (please post code). — pogibas, Oct 05 '17 at 08:48
[How to make a great R reproducible example?](http://stackoverflow.com/questions/5963269) — Sotos, Oct 05 '17 at 08:49
Please elaborate on your problem. At this moment it is not clear how you want to get from your input data to the desired output. — Jaap, Oct 05 '17 at 08:58
Hi...when I convert the dataframe to transactions, it uses the column label as the ID. Therefore, it treats V1=Prod1 as a different product from V2=Prod1. I want to avoid this. — DS_1, Oct 05 '17 at 09:03
What is that? `tData <- as(df, "transactions")` Are you using any packages? Please include them If you do so — Sotos, Oct 05 '17 at 09:17

score 2 · Accepted Answer · answered Oct 07 '17 at 17:18

You have a somewhat strange data format (with exactly the same number of items in each transaction). To convert this correctly you cannot use a data.frame, but you need a list of transactions.

library("arules")

df <- data.frame(
  V1 = c("Prod1", "Prod2", "Prod3"),
  V2 = c("Prod3", "Prod1", "Prod2"), 
  V3 = c("Prod2", "Prod1", "Prod3"), 
  City = c("City1", "City2", "City3"))

m <- as.matrix(df)
l <- lapply(1:nrow(m), FUN = function(i) (m[i, ]))

This is the list format with each transaction as a list element.

l
[[1]]
     V1      V2      V3    City 
"Prod1" "Prod3" "Prod2" "City1" 

[[2]]
     V1      V2      V3    City 
"Prod2" "Prod1" "Prod1" "City2" 

[[3]]
     V1      V2      V3    City 
"Prod3" "Prod2" "Prod3" "City3"

Now it can be coerced into transations

trans <- as(l, "transactions")
inspect(trans)

    items                    
[1] {City1,Prod1,Prod2,Prod3}
[2] {City2,Prod1,Prod2}      
[3] {City3,Prod2,Prod3}

You have some duplicate items in the transactions and these are removed.

score 0 · Answer 2 · answered Oct 05 '17 at 11:20

0

Try this:

df <- data.frame(V1 = c("Prod1", "Prod2", "Prod3"),
             V2 = c("Prod3", "Prod1", "Prod2"), 
             V3 = c("Prod2", "Prod1", "Prod3"), 
             City = c("City1", "City2", "City3"))
colnames(df)<-NULL

tData <- as(df, "transactions")
inspect(tData)

answered Oct 05 '17 at 11:20

user3466328

398
1
11

hi...I tried it...it gave me an error Error in data.frame(labels = paste(v, l, sep = "="), variables = as.factor(v), : arguments imply differing number of rows: 12, 0 – DS_1 Oct 05 '17 at 12:16

Remove column labels from a transaction object

2 Answers2