1

I am using the dataset "adult". http://archive.ics.uci.edu/ml/datasets/Adult I have retrieved frequent rules using apriori and sorted them by lift.

library(arules)
trans = read.transactions("adult.data", format = "basket", sep = ",", rm.duplicates = TRUE)
rules <- apriori(trans)
rules.lift <- sort(rules, decreasing = TRUE, by="lift")

When I execute

  inspect(head(rules.lift,100))

I obtain the following:

    lhs                 rhs               support confidence     lift
  1   { 13,                                                            
      Male,                                                          
      United-States} => { Bachelors}    0.1024507  0.9976077 6.066125
  2   { 0,                                                             
       13,                                                            
       Male,                                                          
       United-States} => { Bachelors}    0.1024507  0.9976077 6.066125

ETC

For example, in the rule:

 { 0,                                                             
   13,                                                            
   Male,                                                          
   United-States} => { Bachelors}

How can I know which attribute that 0 and that 13 are? I have looked at the description of the data set and to the data itself so I guess that 13 is the education-num and 0 is the capital-loss but sometimes two or more attributes can have the same ranges so I would not know how to distinguish them.

>class(rules.lift)
[1] "rules"
attr(,"package")
[1] "arules"

I've read here: How could we know the ColumnName /attribute of items generated in Rules that the problem is I haven't preprocessed the data. So, how can I do that?

Thank you very much!

Community
  • 1
  • 1
Naster
  • 704
  • 1
  • 5
  • 18
  • It would help if you made your example [reproducible](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example). What is the R code you used to generate `rules.lift`. What is `class(rules.lift)`? – MrFlick Nov 12 '14 at 14:14
  • I've added the code to obtain rules.lift and its class. Thank you! :) – Naster Nov 13 '14 at 01:04
  • If you look at the "adult.data" raw input there are no category labels for the "columns" of data. And when you read the data as transactions with format="basket", it doesn't expect the data to be in a tabular format. It treats each column value as a flag ignoring column positions. If you use the build in `Adult` data set that comes with the `rules` package, you can see they've changed the variables to include "headers": `data("Adult"); rules <- apriori(Adult, parameter = list(support = 0.4)); rules.sub <- subset(rules, subset = rhs %pin% "sex" & lift > 1.3); inspect(sort(rules.sub)[1:3])` – MrFlick Nov 13 '14 at 04:41
  • Thank you so much! Now I understand what is going on :) How can I add those headers and consider the position of the columns? Again, thank you so much, you helped me to understand what I was doing! :) – Naster Nov 14 '14 at 14:36

0 Answers0