0

I have a huge chunk of information and I need to store them in a graph as nodes; however, it is very slow. Here are the information about the function and the code, how can I make it faster? (By using numpy or built-in list operators maybe?)

node_lines : Array of strings. And every element in the array store the information of a node and it looks like this : ['5 7'] Here the first number is node id and the second number is node's group id. Every element in the node_lines array consists of these id-group id pairs as a string.

And here is the code:

    for line in node_lines:

        info = line.split()
        id = int(info[0])
        group_id = int(info[1])

        node = n.Node(id, group_id) // Node initializer

        graph.nodes.append(node) // Adding the node to nodes array in the graph 
    
        if group_id> max_group_id: // This part is for finding the maximum group_id in the input
            max_group_id= group_id

So, what I am doing here basically: for every line, extracting the node_id and group_id information and adding this node to the graph. And also looking for the maximum group_id in every iteration so that I can find the total number of groups at the end of iterations.

Let's assume "node" and "graph" class functions are efficient, I am only looking to improve this part.

There is also an edge information part which I am sharing the code for reading it as well:

    for line in edge_lines:

        info = line.split()
        src = int(info[0])
        dst = int(info[1])

        edge = e.Edge(src, dst) // Initialization for edge class
        graph.edges.append(edge) // Adding edge to the graph

Again, in this part, I have an array "edge_lines" and in it, every element contains a string such as '5 7' corresponds to the "source" and "destination" for the edge. Then, adding the edge to the graph.

How these two codes can be improved for efficiency? I am mainly asking for a better way of iteration but I am also open for any advice.

Thank you.

dornekci
  • 19
  • 2
  • Please post a minimal reproducible example: https://stackoverflow.com/help/minimal-reproducible-example – Kapocsi May 23 '21 at 16:42
  • You can use multiprocessing to parallelise your loop https://stackoverflow.com/questions/10797998/is-it-possible-to-multiprocess-a-function-that-returns-something-in-python – techytushar May 23 '21 at 16:47
  • you can use np.where or np.select. they help you eleminate loops. as long as you are dealing with arrays – Ade_1 May 23 '21 at 16:51

0 Answers0