6

I am doing some association rules mining in R and want to extract my results so I can build reports my results look like this:

> inspect(rules[1:3])
  lhs          rhs                         support confidence lift
1 {apples} => {oranges}                    0.00029       0.24  4.4
2 {apples} => {pears}                      0.00022       0.18 45.6
3 {apples} => {pineapples} 0.00014         0.12  1.8

How do i extract the "rhs" here i.e. a vector of oranges, pears and pineapples

Further how do I extract information out of the summary i.e.

> summary(rules)

The data type is "s4" and have no problem extracting when the output is in the list etc.. how do you do the equivelant? set of 3 rules

rule length distribution (lhs + rhs):sizes
2 
3 

   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
      2       2       2       2       2       2 

I want to extract the "3" from the "set of 3 rules"

I have gotten as far as using "@" What does the @ symbol mean in R?

But once i use that, how do i turn my results into a vector i.e.

inspect(rules@rhs)
1 {oranges}
2 {pears}
3 {pineapples}

becomes character vector of length 3

Community
  • 1
  • 1
shecode
  • 1,716
  • 6
  • 32
  • 50
  • i have tried that (as mentioned, i know how to extract from lists etc) and this is an s4 object. when you use names() it returns null – shecode Jul 31 '15 at 02:29

4 Answers4

4

To answer your second question:length(rules)

Now about your first question:

library("arules")
data("Adult")
## Mine association rules.
rules <- apriori(Adult,parameter = list(supp = 0.5, conf = 0.9, target = "rules"))
summary(rules)

l = length(rules)

everything = labels(rules)
#print(everything)

cut = unlist(strsplit(everything,"=> "))[seq(2,2*l,by=2)]
print(cut)

Don't hesitate if you have a question, this might be a bit dense :-)

steph
  • 86
  • 2
3

inspect isn't returning anything, just printing its output. When this happens you can use the function capture.output if you want to save the output as a string. For example, getting the rhs

data(Adult)
rules <- apriori(Adult, parameter = list(support = 0.4))
inspect(rules[1:3])
#   lhs    rhs                              support confidence lift
# 1 {}  => {race=White}                   0.8550428  0.8550428    1
# 2 {}  => {native-country=United-States} 0.8974243  0.8974243    1
# 3 {}  => {capital-gain=None}            0.9173867  0.9173867    1

## Capture it, and extract rhs
out <- capture.output(inspect(rules[1:3]))
gsub("[^{]+\\{([^}]*)\\}[^{]+\\{([^}]*)\\}.*", "\\2", out)[-1]
# [1] "race=White"                   "native-country=United-States"
# [3] "capital-gain=None"           

However, it looks like you can just access this information from the rules with the function rhs

str(rhs(rules)@itemInfo)
# 'data.frame': 115 obs. of  3 variables:
#  $ labels   :Class 'AsIs'  chr [1:115] "age=Young" "age=Middle-aged" "age=Senior" "age=Old" ...
#  $ variables: Factor w/ 13 levels "age","capital-gain",..: 1 1 1 1 13 13 13 13 13 13 ...
#  $ levels   : Factor w/ 112 levels "10th","11th",..: 111 63 92 69 30 54 65 82 90 91 ...

In general, use str to see what objects are made of so you can decide how to extract components.

Rorschach
  • 31,301
  • 5
  • 78
  • 129
  • what if you wanted to get all of the output from inspect(rules) into a dataframe? – shecode Aug 04 '15 at 03:26
  • i've answered my own question - as(rules, "data.frame") – shecode Aug 04 '15 at 03:33
  • i tried capture.output() function but, seems it does not work any more. I got message saying character(0) after i excute `capture.output(inspect(rules[1:5]))` – CloverCeline Apr 13 '19 at 21:09
  • @CloverCeline I don't use R much anymore, but I just tried it and it still works as written in R 3.5.2. Make sure `capture.output` has `type="output"`. `character(0)` is what is returned when `type="message"` – Rorschach Apr 13 '19 at 21:18
  • @jenesaisquoi thank you for the quick response. I use R 3.5.1 and I tried use the code `out <- capture.output(inspect(rules[1:5]),type = 'output') gsub("[^{]+\\{([^}]*)\\}[^{]+\\{([^}]*)\\}.*", "\\2", out)[-1]` then I have no output. then I tried to print out out using ` out <- capture.output(inspect(rules[1:5]),type = 'output') out` then I still only have the result as character(0) but not any strings. could i have more suggesiton thank you – CloverCeline Apr 13 '19 at 21:43
0

You can extract RHS as a character vector of item names (without extraneous text like '=>' or curly brackets) as follows:

rules@rhs@itemInfo$labels[rules@rhs@data@i+1]

The index values stored in rules@rhs@data@i range from 0 to one less than the number of unique labels. Thus, indexing the labels requires adding '1' to avoid attempting to access the 0th element of rules@rhs@itemInfo$labels.

MCornejo
  • 327
  • 1
  • 12
0

Perhaps this wasn't an option at the time this question was asked, but there is a DATAFRAME() function to convert the rules object to a data.frame, from which it is a bit easier to extract what you want. You can even have it exclude the curly braces and set whatever separator you like between items in the item sets.

Borrowing the example from the accepted answer,

data(Adult)
rules <- apriori(Adult, parameter = list(support = 0.4))

We can now turn that into a data.frame and do useful things:

rule_data <- DATAFRAME(rules, 
                       separate = TRUE, 
                       setStart = '', 
                       itemSep = ',', 
                       setEnd = '')

> str(rule_data)
'data.frame':   169 obs. of  6 variables:
 $ LHS       : Factor w/ 76 levels "","relationship=Husband",..: 1 1 1 1 2 3 2 3 3 3 ...
 $ RHS       : Factor w/ 7 levels "race=White","native-country=United-States",..: 1 2 3 4 5 6 7 7 1 2 ...
 $ support   : num  0.855 0.897 0.917 0.953 0.403 ...
 $ confidence: num  0.855 0.897 0.917 0.953 0.999 ...
 $ lift      : num  1 1 1 1 2.18 ...
 $ count     : int  41762 43832 44807 46560 19704 19704 19715 19899 20054 20003 ...

> rule_data$RHS[1:5]
[1] race=White                       
[2] native-country=United-States     
[3] capital-gain=None                
[4] capital-loss=None                
[5] marital-status=Married-civ-spouse
Brian Stamper
  • 2,143
  • 1
  • 18
  • 41