4

I have implemented Apriori algorithm on my dataset. The rules I get though are inverted repititions that is:

inspect(head(rules))
    lhs                        rhs                     support    confidence lift count
[1] {252-ON-OFF}            => {L30-ATLANTIC}          0.04545455 1          22   1    
[2] {L30-ATLANTIC}          => {252-ON-OFF}            0.04545455 1          22   1    
[3] {252-ON-OFF}            => {M01-A molle biconiche} 0.04545455 1          22   1    
[4] {M01-A molle biconiche} => {252-ON-OFF}            0.04545455 1          22   1    
[5] {L30-ATLANTIC}          => {M01-A molle biconiche} 0.04545455 1          22   1    
[6] {M01-A molle biconiche} => {L30-ATLANTIC}          0.04545455 1          22   1 

As can be seen rule 1 & rule 2 are the same just the LHS & RHS are interchanged. Is there any way to remove such rules from the final result?

I saw this post link but the proposed solution is not correct. I also saw this post link and I tried this 2 solutions:

solution A:

rules <- rules[!is.redundant(rules)]

but the result is always the same:

inspect(head(rules))
    lhs                        rhs                     support    confidence lift count
[1] {252-ON-OFF}            => {L30-ATLANTIC}          0.04545455 1          22   1    
[2] {L30-ATLANTIC}          => {252-ON-OFF}            0.04545455 1          22   1    
[3] {252-ON-OFF}            => {M01-A molle biconiche} 0.04545455 1          22   1    
[4] {M01-A molle biconiche} => {252-ON-OFF}            0.04545455 1          22   1    
[5] {L30-ATLANTIC}          => {M01-A molle biconiche} 0.04545455 1          22   1    
[6] {M01-A molle biconiche} => {L30-ATLANTIC}          0.04545455 1          22   1 

Solution B:

# find redundant rules
subset.matrix <- is.subset(rules, rules)
subset.matrix[lower.tri(subset.matrix, diag=T)]
redundant <- colSums(subset.matrix, na.rm=T) > 1
which(redundant)
rules.pruned <- rules[!redundant]
inspect(rules.pruned)
     lhs    rhs                           support    confidence lift count
[1]  {}  => {BRC-BRC}                     0.04545455 0.04545455 1     1   
[2]  {}  => {111-WINK}                    0.04545455 0.04545455 1     1   
[3]  {}  => {305-INGRAM HIGH}             0.04545455 0.04545455 1     1   
[4]  {}  => {952-REVERS}                  0.04545455 0.04545455 1     1   
[5]  {}  => {002-LC2}                     0.09090909 0.09090909 1     2   
[6]  {}  => {252-ON-OFF}                  0.04545455 0.04545455 1     1   
[7]  {}  => {L30-ATLANTIC}                0.04545455 0.04545455 1     1   
[8]  {}  => {M01-A molle biconiche}       0.04545455 0.04545455 1     1   
[9]  {}  => {678-Portovenere}             0.04545455 0.04545455 1     1   
[10] {}  => {251-MET T.}                  0.04545455 0.04545455 1     1   
[11] {}  => {324-D.S.3}                   0.04545455 0.04545455 1     1   
[12] {}  => {L04-YUME}                    0.04545455 0.04545455 1     1   
[13] {}  => {969-Lubekka}                 0.04545455 0.04545455 1     1   
[14] {}  => {000-FUORI LISTINO}           0.04545455 0.04545455 1     1   
[15] {}  => {007-LC7}                     0.04545455 0.04545455 1     1   
[16] {}  => {341-COS}                     0.04545455 0.04545455 1     1   
[17] {}  => {601-ROBIE 1}                 0.04545455 0.04545455 1     1   
[18] {}  => {608-TALIESIN 2}              0.04545455 0.04545455 1     1   
[19] {}  => {610-ROBIE 2}                 0.04545455 0.04545455 1     1   
[20] {}  => {615-HUSSER}                  0.04545455 0.04545455 1     1   
[21] {}  => {831-DAKOTA}                  0.04545455 0.04545455 1     1   
[22] {}  => {997-997}                     0.27272727 0.27272727 1     6   
[23] {}  => {412-CAB}                     0.09090909 0.09090909 1     2   
[24] {}  => {S01-A doghe senza movimenti} 0.09090909 0.09090909 1     2   
[25] {}  => {708-Genoa}                   0.09090909 0.09090909 1     2   
[26] {}  => {998-998}                     0.54545455 0.54545455 1    12 

Has anyone had the same problem and knows how to solve it? Thanks for your help

Lorenzo Benassi
  • 621
  • 1
  • 8
  • 31
  • The confidence is expected to vary between the set of similar rules(For ex: 1 and 2 in your output), lift will always be same. Confidence defines the directionality of the rule. In your case both the confidence values are same because the itemsets (LHS and RHS) have occurred only once. I am not sure if the rules with count=1 should even be considered as rules. What I mean to say is that between the same rule(LHS and RHS swapped), pick the one with higher confidence. – tushaR Dec 22 '17 at 04:31

2 Answers2

3

The issue is your dataset, not the algorithm. In the result, you see that the count of many rules is 1 (item combination occurs once in the transactions) and confidence is 1 for the rule and its "inverse." This means that you need more data and increase the minimum support.

If you still want to get rid of such "duplicate" rules efficiently, then you can do the following:

> library(arules)
> data(Groceries)
> rules <- apriori(Groceries, parameter = list(support = 0.001))
> rules
set of 410 rules

> gi <- generatingItemsets(rules)
> d <- which(duplicated(gi))
> rules[-d]
set of 385 rules 

The code only keeps the first rule of each set of rules with exactly the same items.

Michael Hahsler
  • 2,965
  • 1
  • 12
  • 16
  • thank you for your answer but your code does not work correctly. If you try to use the statement `inspect(head(rules, 46))` (on your set of rules) before using `generatingItemsets(rules)` you can see that the rules 45 and 46 are duplicated/reversed. But after using `generatingItemsets(rules)` `d <- which(duplicated(gi))` `rules[-d]` the rules 45 and 46 are still present – Lorenzo Benassi Dec 22 '17 at 09:57
  • 1
    @LorenzoBenassi `rules[-d]` removes rule 45 (and others). However, R does not modify `rules.` If you want to keep a copy of rules with the "duplicates" removed, then you need to assign the result with something like `rules_no_dublicates <- rules[-d]`. – Michael Hahsler Dec 23 '17 at 14:03
0

You can do it with brute force, by converting your rules object into a data.frame, and iteratively comparing LHS/RHS transaction vectors. Here is an example using the grocery.csv dataset:

inspect(head(groceryrules))

enter image description here

# convert rules object to data.frame
trans_frame <- data.frame(lhs = labels(lhs(groceryrules)), rhs = labels(rhs(groceryrules)), groceryrules@quality) 

# loop through each row of trans_frame
rem_indx <- NULL
for(i in 1:nrow(trans_frame)) {
    trans_vec_a <- c(as.character(trans_frame[i,1]), as.character(trans_frame[i,2]))
    # for each row evaluated, compare to every other row in trans_frame
    for(k in 1:nrow(trans_frame[-i,])) {
        trans_vec_b <- c(as.character(trans_frame[-i,][k,1]), as.character(trans_frame[-i,][k,2]))
        if(setequal(trans_vec_a, trans_vec_b)) {
           # store the index to remove
           rem_indx[i] <- i  
        }
    }
}

This gives you a vector of indices that should be removed (because they are duplicate/inverted)

duped_trans <- trans_frame[rem_indx[!is.na(rem_indx)], ]
duped_trans

enter image description here

We can see that it identified the 2 transactions that were duplicates/inverts.

Now we can keep the non-duplicate transactions:

deduped_trans <- trans_frame[-rem_indx[!is.na(rem_indx)], ]

The issue of course is the above algorithm is extremely inefficient. The grocery dataset only has 463 transactions. For any reasonable number of transactions you will need to vectorize the function.

Cybernetic
  • 12,628
  • 16
  • 93
  • 132