From lists in a column to rows

Question

I have this dataframe

Node                  TLW                            
1                  [2, 22, 3]                           
2                     [12]                              
3                    [2,43]                             
4                     [3]                             
5                     [11]

I would like to have something like this

Could you please tell me how to get it? I would try to use for loop in the list and then append to my dataframe, checking for duplicates. This would be my approach, but I am still having difficulties in using for loop here. I was thinking of using explode but the output would be not what I am looking for, as the (distinct) numbers (or strings) should be in the column Node, not in TLW.

You can combine the Node column and the TLW column into a single list for each row and then use explode. Check my solution which elaborates on that as well as another alternate. — Akshay Sehgal, Jan 22 '21 at 23:51

Akshay Sehgal · Accepted Answer · 2021-01-22T23:50:24.887

2

Method 1

One way is to use apply+lambda to merge the Node and the TLW column into a single list. Then use explode and take a unique(). Post that recreate a dataframe with single column Node

d = {'Node': {0: 1, 1: 2, 2: 3, 3: 4, 4: 5},
     'TLW': {0: [2, 22, 3], 1: [12], 2: [2,43], 3: [3], 4: [11]}}
df = pd.DataFrame(d)

nodes = df.apply(lambda x: [x['Node']]+ x['TLW'], axis=1).explode().unique()
new_df = pd.DataFrame(nodes, columns=['Node'])
print(new_df)

Method 2

Another way would be using numpy's np.unique after df.explode -

import numpy as np

d = {'Node': {0: 1, 1: 2, 2: 3, 3: 4, 4: 5},
     'TLW': {0: [2, 22, 3], 1: [12], 2: [2,43], 3: [3], 4: [11]}}
df = pd.DataFrame(d)


new_df = pd.DataFrame(np.unique(df.explode('TLW').values), columns=['Nodes'])
print(new_df)

edited Jan 22 '21 at 23:50

answered Jan 22 '21 at 23:43

Akshay Sehgal

18,741
3
21
51

Thank you @Akshay Sehgal. One last question: in case I want to add the distinct values in a new dataframe where there are not the original values (so in this case 11,12,22,43 in the new dataframe), how could I edit your code (I am using the first method)? – Math Jan 23 '21 at 02:20
1

just skip the lambda operation and only work with the second column. you can use explode directly on it. then you can do a set difference between the 2 columns – Akshay Sehgal Jan 23 '21 at 02:24
1

if this is not clear and u post it as a new question .. do link it here so I get a notification and i could help solve it – Akshay Sehgal Jan 23 '21 at 02:27
1

It is clear, thank you for your explanation and help. If you want to have a look at another open question that I have asked yesterday, it would be great. I think the approach for resolution should be pretty similar to this one: https://stackoverflow.com/questions/65852710/calculating-weighted-average-using-two-columns-one-with-a-list Thanks – Math Jan 23 '21 at 02:34

From lists in a column to rows

1 Answers1

Method 1

Method 2