Is the Dataframe is ok for representing graph?

Question

I want to represent relationships between nodes in python using pandas.DataFrame And each relationship has weight so I used dataframe like this.

       nodeA nodeB nodeC
nodeA    0     5     1
nodeB    5     0     4   
nodeC    1     4     0

But I think this is improper way to express relationships because the dataframe is symmetric, has duplicated datas.

Is there more proper way than using dataframe to represent graph in python?

(Sorry for my bad English)

score 0 · Accepted Answer · answered Apr 08 '20 at 05:43

This seems like an acceptable way to represent a graph, and is in fact compatible with, say, nextworkx. For example, you can recover a nextworkx graph object as follows:

import networkx as nx
g = nx.from_pandas_adjacency(df)

print(g.edges)
# [('nodeA', 'nodeB'), ('nodeA', 'nodeC'), ('nodeB', 'nodeC')]
print(g.get_edge_data('nodeA', 'nodeB'))
# {'weight': 5}

If your graph is sparse, you may want to store it as an edge list instead, e.g. as discussed here.

Is the Dataframe is ok for representing graph?

1 Answers1