0

I have a data frame df like below:

df <- data.frame(V1 = c("Prod1", "Prod2", "Prod3"),
                 V2 = c("Prod3", "Prod1", "Prod2"), 
                 V3 = c("Prod2", "Prod1", "Prod3"), 
                 City = c("City1", "City2", "City3"))

When I convert this to transaction class, using the code:

tData <- as(df, "transactions")
inspect(tData)

I get a result like below:

    items                                   transactionID
[1] {V1=Prod1,V2=Prod3,V3=Prod2,City=City1} 1            
[2] {V1=Prod2,V2=Prod1,V3=Prod1,City=City2} 2            
[3] {V1=Prod3,V2=Prod2,V3=Prod3,City=City3} 3   

This means that I have V1=Prod1 and V2=Prod1 as separate products when they are actually the same. This is giving me problems when I use this for apriori algorithm.

How can I remove the column labels so that I get the transaction object as:

    items                                   transactionID
[1] {Prod1,Prod3,Prod2,City1} 1            
[2] {Prod2,Prod1,Prod1,City2} 2            
[3] {Prod3,Prod2,Prod3,City3} 3         

Please help.

DS_1
  • 157
  • 1
  • 1
  • 11
  • How do you "convert this to transaction class" (please post code). – pogibas Oct 05 '17 at 08:48
  • 2
    [How to make a great R reproducible example?](http://stackoverflow.com/questions/5963269) – Sotos Oct 05 '17 at 08:49
  • Please elaborate on your problem. At this moment it is not clear how you want to get from your input data to the desired output. – Jaap Oct 05 '17 at 08:58
  • Hi...when I convert the dataframe to transactions, it uses the column label as the ID. Therefore, it treats V1=Prod1 as a different product from V2=Prod1. I want to avoid this. – DS_1 Oct 05 '17 at 09:03
  • Hi...code posted – DS_1 Oct 05 '17 at 09:16
  • What is that? `tData <- as(df, "transactions")` Are you using any packages? Please include them If you do so – Sotos Oct 05 '17 at 09:17

2 Answers2

2

You have a somewhat strange data format (with exactly the same number of items in each transaction). To convert this correctly you cannot use a data.frame, but you need a list of transactions.

library("arules")

df <- data.frame(
  V1 = c("Prod1", "Prod2", "Prod3"),
  V2 = c("Prod3", "Prod1", "Prod2"), 
  V3 = c("Prod2", "Prod1", "Prod3"), 
  City = c("City1", "City2", "City3"))

m <- as.matrix(df)
l <- lapply(1:nrow(m), FUN = function(i) (m[i, ]))

This is the list format with each transaction as a list element.

l
[[1]]
     V1      V2      V3    City 
"Prod1" "Prod3" "Prod2" "City1" 

[[2]]
     V1      V2      V3    City 
"Prod2" "Prod1" "Prod1" "City2" 

[[3]]
     V1      V2      V3    City 
"Prod3" "Prod2" "Prod3" "City3" 

Now it can be coerced into transations

trans <- as(l, "transactions")
inspect(trans)

    items                    
[1] {City1,Prod1,Prod2,Prod3}
[2] {City2,Prod1,Prod2}      
[3] {City3,Prod2,Prod3} 

You have some duplicate items in the transactions and these are removed.

Michael Hahsler
  • 2,965
  • 1
  • 12
  • 16
0

Try this:

df <- data.frame(V1 = c("Prod1", "Prod2", "Prod3"),
             V2 = c("Prod3", "Prod1", "Prod2"), 
             V3 = c("Prod2", "Prod1", "Prod3"), 
             City = c("City1", "City2", "City3"))
colnames(df)<-NULL

tData <- as(df, "transactions")
inspect(tData)
user3466328
  • 398
  • 1
  • 11
  • hi...I tried it...it gave me an error Error in data.frame(labels = paste(v, l, sep = "="), variables = as.factor(v), : arguments imply differing number of rows: 12, 0 – DS_1 Oct 05 '17 at 12:16