4

I am working on a project called "association rule discovery from social network data: Introducing Data Mining to the Semantic Web". Can anyone suggest a good source for an algorithm (and its code. I heard that it can be implemented using Perl and also R packages) to find association rules from a social network database?

The snapshot of the database can be got in the following link: https://docs.google.com/uc?id=0B0mXGRdRowo1MDZlY2Q0NDYtYjlhMi00MmNjLWFiMWEtOGQ0MjA3NjUyZTE5&export=download&hl=en_US

The dataset is available on the following link: http://ebiquity.umbc.edu/get/a/resource/82.zip

I have searched a lot regarding this project but unfortunately can't find something useful as yet. The following link I found somewhat related:

Criminal data : http://www.computer.org/portal/web/csdl/doi/10.1109/CSE.2009.435

Your help will be highly appreciated.

Thank You,

codious
  • 3,377
  • 6
  • 25
  • 46
  • 2
    if you found the answer helpful, kindly 'accept' the answer by clicking the green 'check' that appears when you mouse-over the upper-left hand corner of the answer (the '0' with triangles above and below). – doug Apr 20 '11 at 16:45
  • apologise for the late response as I didn't visit this page. Is it fine now? – codious May 08 '11 at 17:01

3 Answers3

4

Well, the most widely used implementations of the original Association Rules algorithm (originally developed at IBM Almaden Research Center) are Apriori, and Eclat, in particular, the C implementations by Christian Borgelt.

(Brief summary for anyone not familiar with Association Rules (aka "Frequent Items Sets", or "Market Basket Analysis"). The prototype application for Association Rules is analyzing consumer transactions, e.g., supermarket data: Among shoppers who buy polish sausage what percentage of those also also purchase black bread?)

I would recommend the statistical platform, R. It is free and open source, and its package repository contains (at least) four libraries directed solely to Association Rules, all with excellent documentation--three of the four Packages include a Manual and a separate Vignette (informal prose document with code examples). Both the Manuals and Vignettes contain numerous examples in R code.

I have used three of the four Packages below and i can recommend those three highly. Among them are bindings for Eclat and Apriori. These libraries are distributed as R 'Packages', which are available on CRAN, R's primary Package repository. Basic installation and setup of R is trivial--there are binaries for Mac, Linux, and Windows, available from the link above. Likewise, Package installation/integration is as simple as you would expect from an integrated platform (though not every one of the four Packages listed below have binaries for every OS though).

So on CRAN, you will find these Packages all directed solely Association Rules:


This set of four R Packages is comprised of R bindings for four different Association Rules implementations, as well as a visualization library.

The first package, arules, includes R bindings for Eclat and Apriori. The second, arulesNBMiner, is the bindings for Michael Hahsler's Association Rules algorithm NB-frequent itemsets by . The third, arules Sequences, is the bindings for Mohammed Zaki's cSPADE .

The last of these is particularly useful because it is a visualization library for plotting the output from any of the previous three packages. For your social network study, i suspect you will find the graph visualization--i.e., explicit visualization of the nodes (users in the data set) and edges (connections between them).

doug
  • 69,080
  • 24
  • 165
  • 199
  • I am putting a snapshot of our database https://docs.google.com/uc?id=0B0mXGRdRowo1MDZlY2Q0NDYtYjlhMi00MmNjLWFiMWEtOGQ0MjA3NjUyZTE5&export=download&hl=en_US Is it possible find association rules from this database using the R Packages? If not are there Perl algorithm code available to couple with Java(JDBC). Thank You so much. – codious May 10 '11 at 20:35
  • @doug sorry for asking another question even though you already answered. I did not have enough time to explore the R packages. Your advice on the above comment would be very helpful. Thank You. – codious May 10 '11 at 20:49
  • 1
    no problem--i have used the R package, 'arules' against data store in SQLite--at the moment, i can't recall whether that worked 'out-of-the-box' or whether coding a small interface was necessary--i'll check my project files, and get back with you this evening (either with a "yes" or "no" in which case, i'll just give you access to my github repo so you can grab the code). – doug May 10 '11 at 23:24
  • 1
    hi Siddhartha: yes i did. In fact, i was using the R Package SQLiteDF (available from CRAN, w/ excellent documentation); the sqldf project is hosted on Google Code (http://code.google.com/p/sqldf/). I 'remembered' using SQL and arules, but in fact i was accessing the data frame via SQL syntax enabled by sqldf. – doug May 13 '11 at 12:24
2

This is a bit broader than http://en.wikipedia.org/wiki/Association_rule_learning but hopefully useful.

Some earlier FOAF work that might be interesting (SVD/PCA etc):

http://stderr.org/~elw/foaf/ http://www.scribd.com/doc/353326/The-Social-Semantics-of-LiveJournal-FOAF-Structure-and-Change-from-2004-to-2005 http://datamining.sztaki.hu/files/snakdd.pdf

Also Ch.4 of http://www.amazon.com/Understanding-Complex-Datasets-Decompositions-Knowledge/dp/1584888326 is devoted to the application of matrix decomposition techniques against graph data structures; strongly recommended.

Finally, Apache Mahout is the natural choice for large scale data mining, machine learning etc., https://cwiki.apache.org/MAHOUT/dimensional-reduction.html

Dan Brickley
  • 531
  • 7
  • 9
  • Thank you very much. Its would be very interesting to go through all the information you provided. – codious Jul 06 '11 at 23:48
0

If you want some Java code, you can check my website for the SPMF software. It provides source code for more than 45 algorithms for frequent itemset mining, association mining, sequential pattern mining, etc.

Moreover, it does not only provide the most popular algorithms. It also offers many variations such as mining rare itemsets, high utility itemsets, uncertain itemsets, non redundant association rules, closed association rules, indirect association rules, top-k association rules, and much more...

Phil
  • 3,375
  • 3
  • 30
  • 46