Data Manipulation in R for Apriori

Question

I have a part of the data-set as shown below in the form of csv,the number of rows and columns are more than what is shown.I want to implement apriori on this data-set,Say I have this:-

    Maths Science C++ Java DC
[1]    75   44      55  56  88
[2]    56   88      54  78  44

the original dataset has total columns(representing subjects)=30 and serial number(representing students)=24,

DATASET:link

I want to covert this dataset in the form shown below:-

[1] {Maths,DC}
[2] {Science,Java}

i.e A list of list(I think this is what it is called) containing the colnames.A list for a student shows in which subject he/she scored more than or equal to 75 marks,rest of the subjects are dropped(The only condition of the problem)

eq:- first student scored 75+ marks in Dc and Maths and so his list includes only dc and maths.

I am sorry for posting this,but I searched a lot on stack,and found a few of the working suggestions ,but couldn't reach the final goal. My goal is to get a form like this:-

[9834] {semi-finished bread,      
        bottled water,            
        soda,                     
        bottled beer}             
[9835] {chicken,                  
        tropical fruit,           
        other vegetables,         
        vinegar,                  
        shopping bags}

As given in :-

library(arules)
inspect(Groceries)

OR I WILL APPRECIATE IF ANYONE CAN SUGGEST A WAY TO REPRESENT THE DATA IN OTHER FORM WHICH APRIORI CAN UNDERSTAND,BUT IT SHOULD FOLLOW THE NECESSARY CONDITIONS AS STATED.

*(sorry for the long post,I hope this conversion of my dataset in this format may help me study the pattern in student-subject dataset,thnx a ton for all the help)

Welcome to Stack Overflow! Please read the info about [how to ask a good question](http://stackoverflow.com/help/how-to-ask) and how to give a [reproducible example](http://stackoverflow.com/questions/5963269). This will make it much easier for others to help you. — zx8754, Mar 29 '17 at 07:18
I will definitely check the link,thanks for your help.This is the first time I asked a question on stack,so didn't know much about formatting.Again,thanks for the help. — Udaishankar T, Mar 29 '17 at 07:20

Aurèle · Accepted Answer · 2017-03-30T07:56:26.590

library(plyr)
library(arules)
df <- read.table(text = 
"   75   44      55  56  88
    56   88      54  78  44")
names(df) <- c("Maths", "Science", "C++", "Java", "DC")
transactions <- as(alply(df, 1, function(x) names(x)[x >= 75]), "transactions")
inspect(transactions)

#     items          transactionID
# [1] {DC,Maths}     1            
# [2] {Java,Science} 2

Edit: It works with your example dataset, too:

library(plyr)
library(arules)
df <- read.csv(file = url("https://drive.google.com/uc?export=download&id=0B3kdblyHw4qLR0dpT24xWUZGcGs"))
transactions <- as(alply(df, 1, function(x) names(x)[x >= 75]), "transactions")
inspect(transactions)

#      items                              transactionID
# [1]  {CD,CG,CN,DA,Data.Struc}           1            
# [2]  {CD,CG,CO,ML,OS}                   2            
# [3]  {CN,Data.Struc,DC,DM,DMS}          3            
# [4]  {CHE,DD,DM,EC,EE}                  4            
# [5]  {CHE,CN,MATHS,PHY}                 5            
# [6]  {Data.Science,DM,DMS,ML,OS}        6            
# [7]  {CD,DA,Data.Struc,EC,MATHS}        7            
# [8]  {CG,CHE,CN,CO,OS}                  8            
# [9]  {CN,CO,Data.Science,DC,DMS}        9            
# [10] {DC,DD,EC,EE,PHY}                  10           
# [11] {CHE,DD,DMS,MATHS,PHY}             11           
# [12] {CN,Data.Science,DM,MATHS,ML}      12           
# [13] {CD,CG,DA,Data.Science,Data.Struc} 13           
# [14] {CG,CO,EE,MATHS,OS}                14           
# [15] {CN,CO,DC,DMS,PHY}                 15           
# [16] {CN,CO,DD,EC,EE}                   16           
# [17] {CHE,DA,EE,MATHS,PHY}              17           
# [18] {Data.Science,DD,DM,ML,PHY}        18           
# [19] {CD,CO,DA,Data.Struc,DC}           19           
# [20] {CG,CO,DD,DM,OS}                   20           
# [21] {CG,CN,DA,DC,DMS}                  21           
# [22] {DD,EC,EE,ML,OS}                   22           
# [23] {CHE,CN,Data.Struc,MATHS,PHY}      23           
# [24] {CG,Data.Science,DM,EE,ML}         24

apom,I see the code works for the given scenario i.e for 5 subjects,but as I mentioned above the dataset has 30 subjects,24 students.Thus I needed some function (like lapply etc,I dont know much about R) which can convert every row of the dataset,representing marks scored by a student in the 30 subjects,into the "goal format",dataset:[link](https://drive.google.com/open?id=0B3kdblyHw4qLR0dpT24xWUZGcGs) — Udaishankar T, Mar 30 '17 at 04:56
@UdaishankarT It works with your example dataset, too. See my edit — Aurèle, Mar 30 '17 at 07:56
Tnx Apom,I am new to R,so didn't know we could do that,again tnx for all the help. — Udaishankar T, Mar 30 '17 at 14:48

Data Manipulation in R for Apriori

1 Answers1