0

I have this dataframe

Node                  TLW                            
1                  [2, 22, 3]                           
2                     [12]                              
3                    [2,43]                             
4                     [3]                             
5                     [11]  

                        

I would like to have something like this

Node
1
2
3
4
5
22
12
43
11

Could you please tell me how to get it? I would try to use for loop in the list and then append to my dataframe, checking for duplicates. This would be my approach, but I am still having difficulties in using for loop here. I was thinking of using explode but the output would be not what I am looking for, as the (distinct) numbers (or strings) should be in the column Node, not in TLW.

Math
  • 191
  • 2
  • 5
  • 19
  • 1
    You can combine the Node column and the TLW column into a single list for each row and then use explode. Check my solution which elaborates on that as well as another alternate. – Akshay Sehgal Jan 22 '21 at 23:51

1 Answers1

2

Method 1

One way is to use apply+lambda to merge the Node and the TLW column into a single list. Then use explode and take a unique(). Post that recreate a dataframe with single column Node

d = {'Node': {0: 1, 1: 2, 2: 3, 3: 4, 4: 5},
     'TLW': {0: [2, 22, 3], 1: [12], 2: [2,43], 3: [3], 4: [11]}}
df = pd.DataFrame(d)

nodes = df.apply(lambda x: [x['Node']]+ x['TLW'], axis=1).explode().unique()
new_df = pd.DataFrame(nodes, columns=['Node'])
print(new_df)
  Node
0    1
1    2
2   22
3    3
4   12
5   43
6    4
7    5
8   11

Method 2

Another way would be using numpy's np.unique after df.explode -

import numpy as np

d = {'Node': {0: 1, 1: 2, 2: 3, 3: 4, 4: 5},
     'TLW': {0: [2, 22, 3], 1: [12], 2: [2,43], 3: [3], 4: [11]}}
df = pd.DataFrame(d)


new_df = pd.DataFrame(np.unique(df.explode('TLW').values), columns=['Nodes'])
print(new_df)
  Nodes
0     1
1     2
2     3
3     4
4     5
5    11
6    12
7    22
8    43
Akshay Sehgal
  • 18,741
  • 3
  • 21
  • 51
  • Thank you @Akshay Sehgal. One last question: in case I want to add the distinct values in a new dataframe where there are not the original values (so in this case 11,12,22,43 in the new dataframe), how could I edit your code (I am using the first method)? – Math Jan 23 '21 at 02:20
  • 1
    just skip the lambda operation and only work with the second column. you can use explode directly on it. then you can do a set difference between the 2 columns – Akshay Sehgal Jan 23 '21 at 02:24
  • 1
    if this is not clear and u post it as a new question .. do link it here so I get a notification and i could help solve it – Akshay Sehgal Jan 23 '21 at 02:27
  • 1
    It is clear, thank you for your explanation and help. If you want to have a look at another open question that I have asked yesterday, it would be great. I think the approach for resolution should be pretty similar to this one: https://stackoverflow.com/questions/65852710/calculating-weighted-average-using-two-columns-one-with-a-list Thanks – Math Jan 23 '21 at 02:34