I'm writing an application about MST algorithm passing huge graph (like 100 / 150 milion edges) in Python 2.7 . Graph is setted up with Adjacency List using a classic class with method like :
def insertArcW(self, tail, head, weight):
if head in self.nodes and tail in self.nodes:
self.adj[tail].addAsLast(head)
self.adj[tail].addAsLast(weight)
def insertNode(self, e):
newnode = Node(self.nextId, e)
self.nextId += 1
I'm also using Linked List (created with array) and queue from python stdlibrary(version 2.7). With this piece of code the insert is really fast (due to less number of nodes compare to number of edges.):
n = []
for i in xrange(int(file_List[0])):
n.append(G.insertNode(i))
Problem comes with the insert of the edges:
for e in xrange(len(arc_List))
G.insertArcW(n[arc_List[e][0]].index, n[arc_List[e][1]].index,arc_List[e][2])
G.insertArcW(n[arc_List[e][1]].index, n[arc_List[e][0]].index,arc_List[e][2])
It's working great with 1 milion edges but with more it going to eat all of my ram (4GB , 64bit) but no freeze ! It can build the graph in a long time ! Considering that usage of CPU is limited to 19/25 % while doing this , there is a way of doing such things in multiprocess or multithread ? Like build the graph with two core doing same operation at same time but with different data ? I mean one core working with half of edges and another core with other half. I'm practically new to this "place of programming" above all in Python.
EDIT : By using this function i'm setting up two list for nodes and edges ! I need to take information by a ".txt" file. Inserting the insertArcW and insertNode there is a oscillation of RAM between 2.4GB to 2.6GB . Now I can say that is stable (maybe due to "delete" of the two huge list of edges and node) but always at the same speed. Code :
f = open(graph + '.txt','r')
v = f.read()
file_List = re.split('\s+',v)
arc_List = []
n = []
p = []
for x in xrange(0,int(file_List[1])):
arc_List.append([0,0,0])
for i in xrange(int(file_List[0])):
n.append(G.insertNode(i))
for weight in xrange(1,int(file_List[1])+1):
p.append(weight)
random.shuffle(p)
i = 0
r = 0
while r < int(file_List[1]):
for k in xrange(2,len(file_List),2):
arc_List[r][0] = int(file_List[k])
arc_List[r][1] = int(file_List[k+1])
arc_List[r][2] = float(p[i])
G.insertArcW(n[arc_List[r][0]].index, n[arc_List[r][1]].index,arc_List[r][2])
G.insertArcW(n[arc_List[r][1]].index, n[arc_List[r][0]].index,arc_List[r][2])
print r
i+=1
r+=1
f.close()