I have a huge chunk of information and I need to store them in a graph as nodes; however, it is very slow. Here are the information about the function and the code, how can I make it faster? (By using numpy or built-in list operators maybe?)
node_lines : Array of strings. And every element in the array store the information of a node and it looks like this : ['5 7'] Here the first number is node id and the second number is node's group id. Every element in the node_lines array consists of these id-group id pairs as a string.
And here is the code:
for line in node_lines:
info = line.split()
id = int(info[0])
group_id = int(info[1])
node = n.Node(id, group_id) // Node initializer
graph.nodes.append(node) // Adding the node to nodes array in the graph
if group_id> max_group_id: // This part is for finding the maximum group_id in the input
max_group_id= group_id
So, what I am doing here basically: for every line, extracting the node_id and group_id information and adding this node to the graph. And also looking for the maximum group_id in every iteration so that I can find the total number of groups at the end of iterations.
Let's assume "node" and "graph" class functions are efficient, I am only looking to improve this part.
There is also an edge information part which I am sharing the code for reading it as well:
for line in edge_lines:
info = line.split()
src = int(info[0])
dst = int(info[1])
edge = e.Edge(src, dst) // Initialization for edge class
graph.edges.append(edge) // Adding edge to the graph
Again, in this part, I have an array "edge_lines" and in it, every element contains a string such as '5 7' corresponds to the "source" and "destination" for the edge. Then, adding the edge to the graph.
How these two codes can be improved for efficiency? I am mainly asking for a better way of iteration but I am also open for any advice.
Thank you.