1

I want to run query on multiple columns (Data1, Data2) to find the values in other columns (Column1, Column2, Column3) if it exists in any of these columns (considering these columns have empty cells also)

My data looks like this:

Column1  Column2  Column3  Data1    Data2
sunday   monday            sunday   friday
sunday   friday   tuesday  friday   sunday
sunday            monday   thursday tuesday

I want the output like:

Column1  Column2  Column3  Data1    Exist_Data1   Data2      Exist_Data2
sunday   monday            sunday   Yes           friday     Yes
sunday   friday   tuesday  friday   Yes           wednesday  No
sunday            monday   thursday  No           tuesday    Yes

I have tried the command below:

pd.Series(['A', 'B']).isin(df.columns).all() (as mentioned in the Check if multiple columns exist in a df

but it check in all the columns. I have modified the df.columns but doesn't seem to work either.

*******UPDATE ********

Editing after trying the code:

As can be seen in the image below, there are some entries in Data1 and Data2 that exists in the other columns but still says "No". (There are no whitespace characters - all are removed using strip()" Double checked after writing the CSV and checking for spaces.

enter image description here

soosa
  • 125
  • 11
  • Answer was edited, Last `Exist_Data1` is not correctly `True` ? – jezrael Jun 17 '21 at 09:35
  • No, in the image Data1 has "A1BG-AS1" that exists in Column1 twice but still it says "No" and even the "NCRNA" says "No" but it do exists in Column2 – soosa Jun 17 '21 at 09:49

1 Answers1

1

You need Series.isin with flatten all values of columns to one long list:

cols = ['Column1','Column2','Column3']
v = np.ravel(df[cols])

df['Exist_Data1'] = np.where(df['Data1'].isin(v),'Yes', 'No')
df['Exist_Data2'] = np.where(df['Data2'].isin(v),'Yes', 'No')

#alternative
df[['Exist_Data1','Exist_Data2']] = np.where(df[['Data1', 'Data2']].isin(v),'Yes', 'No')


print (df)
  Column1 Column2  Column3     Data1      Data2 Exist_Data1 Exist_Data2
0  sunday  monday      NaN    sunday     friday         Yes         Yes
1  sunday  friday  tuesday    friday  wednesday         Yes          No
2  sunday     NaN   monday  thursday    tuesday          No         Yes
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
  • This one also has an error as in Data1 "thursday" does not exists in any of the columns but it says "Yes" in Exist_Data1. I am not sure why it's happening – soosa Jun 17 '21 at 09:50
  • Thank you so much.... The first one worked perfectly as needed. Can you please elaborate why the errors were happening earlier? – soosa Jun 17 '21 at 10:11
  • @aso - Problem was I compare `df[cols]` with `df['Data1']` instead `df['Data1']` with `df[cols]`. – jezrael Jun 17 '21 at 10:12