1

I am new to python and trying to create a bipartite network from a data that looks similar to this:

| User     |       Text          |
| -------- | ------------------- |
| user1    |[ 'abc','xyz','def' ]|
| user2    |[ 'lmo','gf' ]       |
| user3    |[ 'lmn','gf' ]       |
| user4    |['abc','xyz','def' ] |

When I create a network, nodes that represent text column will have a list as value at node which looks like this:

enter image description here

Instead of having list at node I want to create separate nodes for abc, xyz and so on and then connect those nodes with their respective users. For example user1 will have an edge between abc, xyz and def separately. How can I break the list in such a manner that every value in a list can be made as a separate node. I am stuck here. Thank you for the help in advance.My code so far is as follows:

    sub_data = pd.read_csv('E:\\users.csv')
    edges = [tuple(x) for x in sub_data[['user','text']].values.tolist()]
    B = nx.Graph()
    B.add_nodes_from(sub_data['user'].unique(), bipartite=0, label='user')
    B.add_nodes_from(sub_data['text'].unique(), bipartite=1, label='hashtag')
    B.add_edges_from(edges, label='rating')
    left_or_top = sub_data['user'].unique()
    pos = nx.bipartite_layout(B, left_or_top)
    nx.draw(B,pos,node_color='#A0CBE2',edge_color='#00bb5e',width=1,
     edge_cmap=plt.cm.Blues,with_labels=True)
  
  • This question: [Pandas expand rows from list data available in column](https://stackoverflow.com/questions/39011511/pandas-expand-rows-from-list-data-available-in-column) and [`from_pandas_edgelist`](https://networkx.org/documentation/stable/reference/generated/networkx.convert_matrix.from_pandas_edgelist.html) should solve your issue. – Sparky05 Mar 29 '21 at 15:29
  • 1
    I will try to be more specific. I think that based on @Sparky05 's link `sub_data = sub_data.explode('text1').reset_index(drop=True)`, will work for you. – Yannis P. Mar 30 '21 at 14:48
  • I have tried that link but unfortunately it didn't work. It didn't break the list but just displays the data as it is. I don't understand why it's unable to separate data.:( – Azmat Inayat Mar 31 '21 at 14:35

1 Answers1

0

Here is a possible solution:

import networkx as nx
import pandas as pd

df = pd.DataFrame({'user': ['user1', 'user2', 'user3', 'user4'],
                   'text': [['abc', 'xyz', 'def'], ['lmo', 'gf'],
                            ['lmn', 'gf'], ['abc', 'xyz', 'def']]})
graph = nx.convert_matrix.from_pandas_edgelist(
    df.explode('text').rename(columns={'user': 'source', 'text': 'target'})
)

It's important to rename your columns because nx.convert_matrix.from_pandas_edgelist expects to find "source" and "target" within your dataframe.

If you print graph.edges you can see you got the correct result:

[('user1', 'abc'), ('user1', 'xyz'), ('user1', 'def'),
 ('abc', 'user4'), ('xyz', 'user4'), ('def', 'user4'),
 ('user2', 'lmo'), ('user2', 'gf'), ('gf', 'user3'), ('user3', 'lmn')]
Riccardo Bucco
  • 13,980
  • 4
  • 22
  • 50