Subset of columns from another data frame

Question

I have a dataframe (G) whose columns are “Client” and “TIV”.

I have another dataframe whose (B) columns are “Client”, “TIV”, “A”, “B”, “C”.

I want to select all rows from B whose clients are not in G. In other words, if there is a row in B whose Client also extsist in G then I want to delete it.

I did this:

x= B[B[‘Client’]!= G[‘Client’]

But it returned saying that “can only compare identically labeled Series Object”

I appriciate your help.

Does this answer your question? [How to filter Pandas dataframe using 'in' and 'not in' like in SQL](https://stackoverflow.com/questions/19960077/how-to-filter-pandas-dataframe-using-in-and-not-in-like-in-sql) — Chris, Sep 06 '22 at 13:33
please share samples of dataframe so people can try it on their own machines — grymlin, Sep 06 '22 at 13:33
I think what you are looking for is an anti join. Check this post: https://stackoverflow.com/questions/38516664/anti-join-pandas — Jannik, Sep 06 '22 at 13:35
@grymlin Thank you for your feedback. Anything would work so that's why I didn't put anything there. As long as Column of G is not selected in B, I am happy :) — navid, Sep 06 '22 at 13:39
@Chris I am trying my best to understand it actually. I am not sure — navid, Sep 06 '22 at 13:40

score 1 · Accepted Answer · answered Sep 06 '22 at 13:37

1

You can use df.isin combined with ~ operator:

B[~B.Client.isin(G.Client)]

answered Sep 06 '22 at 13:37

Nuri Taş

3,828
2
4
22

1

consider `G.Client.unique()` for the performance on larger data sets – Chris Sep 06 '22 at 13:41
Appreciated the feedback, @navid check this out – Nuri Taş Sep 06 '22 at 13:44

score 0 · Answer 2 · answered Sep 06 '22 at 13:42

Maybe the following code snippet helps:

df1 = pd.DataFrame(data={'Client': [1,2,3,4,5]})
df2 = pd.DataFrame(data={'Client': [1,2,3,6,7]})
# Identify what Clients are in df1 and not in df2
clients_diff = set(df1.Client).difference(df2.Client)
df1.loc[df1.Client.isin(clients_diff)]

The idea is to filter df1 on all clients which are not in df2

Subset of columns from another data frame

2 Answers2