-1

I am trying to parse a big Graph but it's written 'Memory Error' which Azure data solution should I use and how?

I posted the following code as I ran it on my computer

import networkx as nx


class GraphFromTxt:
    def __init__(self, text): # init from text file
        self.GraphStan = []
        file = open(text, "r")
        for line in file:
            self.GraphStan.append(line)

    def print_list(self):
        print(self.GraphStan)

    def length(self):
        print(self.GraphStan.__len__())

    def print_edges(self, G):
        print(G.edges())

    def parse(self):
        return nx.parse_edgelist(self.GraphStan, nodetype=int)


G_listed = GraphFromTxt("stan.txt")
G_listed.length()
G = G_listed.parse()

output:

"C:\Users\Roy Greenberg\AppData\Local\Programs\Python\Python37-32\python.exe" "C:/Users/Roy Greenberg/PycharmProjects/Random-walks/Graph_from_txt.py"
7600595
Traceback (most recent call last):
  File "C:/Users/Roy Greenberg/PycharmProjects/Random-walks/Graph_from_txt.py", line 26, in <module>
    G = G_listed.parse()
  File "C:/Users/Roy Greenberg/PycharmProjects/Random-walks/Graph_from_txt.py", line 21, in parse
    return nx.parse_edgelist(self.GraphStan, nodetype=int)
  File "C:\Users\Roy Greenberg\AppData\Local\Programs\Python\Python37-32\lib\site-packages\networkx\readwrite\edgelist.py", line 296, in parse_edgelist
    G.add_edge(u, v, **edgedata)
  File "C:\Users\Roy Greenberg\AppData\Local\Programs\Python\Python37-32\lib\site-packages\networkx\classes\graph.py", line 900, in add_edge
    datadict = self._adj[u].get(v, self.edge_attr_dict_factory())
MemoryError

Process finished with exit code 1
ponylama
  • 51
  • 3
  • "It looks like your post is mostly code; please add some more details. It looks like your post is mostly code; please add some more details. It looks like your post is mostly code; please add some more details." - what??????? – Shihab Shahriar Khan Dec 27 '18 at 11:01
  • 1
    @ShihabShahriar - I believe that text is generated with the new question wizard, as it provides guidance for people asking their first question. There is also a minimum question size, so I suspect the OP did some text copying to bypass that check. – David Makogon Dec 27 '18 at 13:41
  • Welcome to Stack Overflow! Please edit your question with specifics, sample data, etc, as it's currently unclear, exactly, what's going on, given that there's not much detail. You mention a big graph, but... what graph are you referring to? (you haven't shown any graph). The only thing I see is that you're reading in a text file and storing it in a local variable. Please edit to be specific. As for which Azure service to use: I don't think it's possible to offer a recommendation, as we have no details about your data at all (plus tool/product/service recommendations are off-topic). – David Makogon Dec 27 '18 at 13:45
  • How big is your graph (how many nodes, & how many edges)? – Joel Dec 28 '18 at 18:50
  • One thing that looks quite inefficient is that you are reading the graph from a text file, storing the entire text file in memory (as a list of strings) and then building a graph from it. As it's building the graph it runs out of memory. But there will be ways to load it into a graph directly from the text file without having to store it in memory. – Joel Dec 28 '18 at 18:53

1 Answers1

0

Just according to your error information, it seems that you were using a 32-bit Python on Windows which limits your Python process to get only 2GB max memory to build a networkx Graph in memory. Please refer to the SO thread Python 32-bit memory limits on 64bit windows to know it.

So per my experience, I think the Memory Error issue means your currect work in Python 32-bit applies for allocing more memories, but that will exceed the allowed max memory limitation to cause this issue.

Therefore, assumption for enough memory on your local machine, my suggestion is that to use a 64-bit Python to run your script again. Or you can consider for a workaround solution to build a partial graph per once, then to dump the partial one into disk to parse the others and links these sub-graphies like a linked table for loading later.

Peter Pan
  • 23,476
  • 4
  • 25
  • 43