I am using networkx
to build an email network structure from a txt file where each row represents an "edge." I first loaded the txt file (3 columns: {'#Sender', 'Recipient', 'time'}) into Python and then converted to an networkx
object using the following code:
import networkx as nx
import pandas as pd
email_df = pd.read_csv('email_network.txt', delimiter = '->')
email = nx.from_pandas_dataframe(email_df, '#Sender', 'Recipient', edge_attr = 'time')
The email.txt
data can be accessed here.
However, email_df
(a pandas
DataFrame
object) has a length of 82927, while email
(a Networkx
object) has a length of 3251.
In [1]: len(email_df)
In [2]: 82927
In [3]: len(email.edges())
In [4]: 3251
I got really confused because even if for rows containing the same two nodes in the first two columns of email_df
with the same sequence of direction (say, '1' to '2'), the third column ('time', meaning timestamped) should distinguish them from each other, hence, no replicated edges would appear. Then why does the number of edges dramatically decreased from 82927 to 3251 after I used nx.from_pandas_dataframe
to read from `email_df'?
Would anyone help explain this to me?
Thank you.