1

I have a Dataframe below. I wanted to map a True/False based on the logic:

  1. If Name and Number match any rows in the Dataframe, then compare if any of the number in column List exists.
  2. If it exists, False, else True.
Name   Number   List                
A       905     [100,200,300,400] 
A       905     [200,500] 
A       905     [100,900]        
A       805     [100]               
A       805     [200]               
B       905     [600,700]               
B       905     [800,900]           

It should be something like this

Name   Number   List                Output
A       905     [100,200,300,400]   False      
A       905     [200,500]           False
A       905     [100,900]           False
A       805     [100]               True
A       805     [200]               True
B       905     [600,700]           True     
B       905     [800,900]           True

Thanks in advance!

1 Answers1

0

Idea is create connected_components and then test if all values of lists exist there per groups in GroupBy.transform.

import networkx as nx

import networkx as nx

def f(x):
    # Create the graph from the dataframe
    G=nx.Graph()
    for l in x:
        nx.add_path(G, l)

    new = list(nx.connected_components(G))
    mapped =  {node: cid for cid, component in enumerate(new) for node in component}

    return x.str[0].map(mapped)


df['Output'] = df.groupby(['Name','Number'])['List'].transform(f)

df['Output'] = ~df.duplicated(['Name','Number','Output'], keep=False)
print (df)
  Name  Number                  List  Output
0    A     905  [100, 200, 300, 400]   False
1    A     905            [200, 500]   False
2    A     905            [100, 900]   False
3    A     805                 [100]    True
4    A     805                 [200]    True
5    B     905            [600, 700]    True
6    B     905            [800, 900]   False
7    B     905            [900, 800]   False
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
  • May i know what does this code does? ```set(z) <= set(y)``` – Gabriel Choo Jul 06 '21 at 07:48
  • @GabrielChoo - Compare if after `connected_components` exist per groups. Because if exist then lists are joined and `any` return `False` from list of Falses. Added `print` to sample data for see how it working – jezrael Jul 06 '21 at 07:51
  • When i added ```B 905 [900,800]``` it will appears one as True and one as False – Gabriel Choo Jul 06 '21 at 09:31
  • @GabrielChoo - testing. – jezrael Jul 06 '21 at 09:32
  • @GabrielChoo - You are right, answer was edited. solution by [this](https://stackoverflow.com/a/66239308/2901002) anf then is tested dupliciates by [`DataFrame.duplicated`](http://andas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.duplicated.html) – jezrael Jul 06 '21 at 10:16
  • 1
    It works! Thanks, just that I need some time to figure out the logic of the codes – Gabriel Choo Jul 06 '21 at 11:24