2

I am working on this project where I need to run some graph algorithms on a graph representing English Wikipedia articles. This would need to be in real time

I have tried

Was wondering if there are some obvious fast Java / Python APIs available that I have missed and are better ?

sanz
  • 1,069
  • 1
  • 10
  • 26
  • 1
    What have you tried? These look promising: http://stackoverflow.com/questions/606516/python-graph-library – NoBugs Nov 09 '12 at 04:30
  • @NoBugs sorry about that. I have edited the question to include what I have tried. – sanz Nov 09 '12 at 04:39
  • Is the implication that a vertex in this graph is an article, and a directed edge is a hyperlink between articles? It is fairly easy to run the XML dump through an XML parser and then search each article body for the regex `\[(.*)\]` to find links. – Andrew Tomazos Nov 09 '12 at 04:56
  • 1
    @AndrewTomazos-Fathomling It is a lot of data and placing it in memory would not be ideal. Ideally, it would require a database and some caching implementations to give good speed / low memory usage. – sanz Nov 09 '12 at 05:21
  • The graph itself will fit in memory easily, it would be way less than a gig. The entire xml dump is only 30gig. A simple in-memory uncached representation will be far faster than any database. – Andrew Tomazos Nov 09 '12 at 07:11
  • Btw the topic has been discussed at length at http://lists.wikimedia.org/pipermail/analytics/2013-December/thread.html#1368 – Nemo Jan 17 '14 at 20:54

0 Answers0