Issue with pandas query for dataframe

Question

Consider the following.

import pandas as pd
d=pd.DataFrame([[1,'a'],[1,'b'],[2,'c'],[2,'a'],[3,'c'],[4,'a'],[4,'c']],columns=['A','B'])

I want those values in A which have EXACTLY 'c' ('c' and only 'c') associated with them. There is only one such value. It is 3. I wrote the following query, but it is not returning the right result.

d[ d.B.isin(['c'])  & ~d.A.isin(d[d.B.isin(set(d.B.unique())-{'c'})].A.to_frame()) ].A.to_frame()

My idea is to find all values in A that have 'c' associated with them and then remove from them those values that also have something other than 'c' associated with them. But what the code returns is just the values that have 'c' associated with them. Can someone help me with this? Thank you.

`d.loc[d['B']=='c','A']` will return all values in column A that have a column B value equal to 'c' — Mayeul sgc, Oct 30 '19 at 09:32
My question is different. I want to know which values in A have EXACTLY 'c' associated with them in column B. Your query returns the values in A that have 'c' associated with them. These values in A may also have values in B that are other than 'c' associated with them. — user17144, Oct 30 '19 at 09:47
I don't know why my question was marked as duplicate. I am not asking how to look for rows with a given value in a certain column. I already know that. I want to know how to find values in column A that have a certain value in another column associated with them, and no other values in that column associated with them. Do you see the difference, jezrael? — user17144, Oct 30 '19 at 09:48
I state it in the question "I want those values in A which have EXACTLY 'c' ('c' and only 'c') associated with them. There is only one such value. It is 3. I wrote the following query, but it is not returning the right result." — user17144, Oct 30 '19 at 10:03
Yes. What is the logic behind this part? I see that if any values in A have other than 'c' associated with them, they are treated as duplicates and removed, but I can't correlate it with the syntax you have used. Also, how do I reach you? Your web page says that one should feel free to write to you but how are you to be reached? I won't bother you, but there might be instances when I might need to reach you. — user17144, Oct 30 '19 at 10:10
Would you be able to tell me why my logic does not work? It is more in line with how one would write nested SQL, and probably more intuitive for someone who is already familiar with SQL. — user17144, Oct 30 '19 at 10:16

score 1 · Accepted Answer · edited Oct 30 '19 at 12:10

The simplest idea is filter values with c and not allow duplicates in A column:

s1 = d.loc[d.B.eq('c') & ~d.A.duplicated(keep=False), 'A']
print (s1)
4    3
Name: A, dtype: int64

Your solution works by removing .to_frame(), but better is to use loc for select by mask - evaluation order matters:

s2 = d.loc[ d.B.isin(['c'])  & ~d.A.isin(d.loc[d.B.isin(set(d.B.unique())-{'c'}), 'A']), 'A']
print (s2)
4    3
Name: A, dtype: int64

s2 = d[ d.B.isin(['c'])  & ~d.A.isin(d[d.B.isin(set(d.B.unique())-{'c'})].A) ].A
print (s2)
4    3
Name: A, dtype: int64

Issue with pandas query for dataframe

1 Answers1