2

I have a NetworkX graph (g) with ~ 25k nodes and 125k edges. I want to cache g using memcached but g is too big. The most I am able to increase the memcached limit per item is to 32MB which doesn't do it.

  1. Should I even try and get this to work with memcached?

  2. What are my other options, if I want to be able to store a networkx graph that could have upto 1m nodes and 10m edges?

  3. How might I chunk the graph to make it smaller without (a) knowing anything about the graph, and (b) in a way that causes minima performance degradation putting the chunks back together.

I'm working with python. Sample code to create the graph is attached.

import sys
import pickle
import random
import networkx as nx
from django.core.cache import cache

def randstring(x=3):
    return ''.join([chr(random.randrange(65, 91)) for _ in range(x)])

class Qux(object):
    def __init__(self, foo, bar):
        self.foo = foo
        self.bar = bar

for n, v in {1: 500, 2: 5000, 3: 50000}.items():
    g = nx.Graph()
    nodes = [Qux(randstring(), randstring()) for _ in range(v)]
    g.add_nodes_from(nodes)
    for node in g.nodes:
        num = random.randrange(25)
        edges = [(node, random.choice(nodes)) for _ in range(num)]
        g.add_edges_from(edges)

    print len(g.nodes), sys.getsizeof(pickle.dumps(g))
    cache.set('{}/graph'.format(n), g, 3600)

Memcached console output (memcached -I 32M -vv)

<20 new auto-negotiating client connection
20: Client using the ascii protocol
<20 set :1:1/graph 1 3600 130678 
>20 STORED
<20 delete :1:2/graph
>20 NOT_FOUND
<20 delete :1:3/graph
>20 NOT_FOUND
Vishal
  • 2,097
  • 6
  • 27
  • 45

1 Answers1

1

If you look at the code for NetworkX the graphs are nothing more than python dicts. If the only functionality you need is accessing nodes and edges then building your graph using dicts and converting the file to JSON could allow you to cache the graph. The code for adding edges to an undirected weighted graph is essentially this:

G_New = {}
for edge in edges:
    try:
        G_New[edge['node1'].update({edge['node2']: edge['weight']})
    except KeyError:
        G_New[edge['node1']] = {edge['node2']: edge['weight']}
    try:
        G_New[edge['node2']].update({edge['node1']: edge['weight']})
    except KeyError:
        G_New[edge['node2']] = {edge['node1']: edge['weight']}

After that it's a simple json.dumps(G_New). For larger graphs you can then split the dict into smaller components and host each one on memcache. As such How to split dictionary into multiple dictionaries fast

alvaroov
  • 11
  • 1