0

I am trying to solve a pandas problem for data analysis. I have 2 csv files that I have loaded into 2 separate data frames. Lets call them df1 and df2.

There is a reference (master data) column in df1 that has a master list of unique codes. In df2, I have 2 separate columns with values that I need to validate against this master column in df1. Something similar to a vlookup.

Example:

df1['master_column'] has values ['123', '456','789']

df2 has 2 columns with multiple values but only those rows need to be retained whose values match df1 column values above. The check needs to happen for both the columns in df2. I tried using merge (similar to inner join) but I guess that will only let me join one column from df2 with df1 and I need to apply the join condition with 2 columns. Not sure if I have explained correctly.

rb_27
  • 1
  • Do you need `df1[df1['master_column'].isin(df2.to_numpy().tolist())]` ? Or something else? Can you add smal data sample for both DataFrames and expected ouput? – jezrael Apr 21 '22 at 05:31
  • Thanks for response. df1['column1'] has values ['123', '456','789'] and df2['column1'] has values ['123', '456','111','222', '333', '789']. So I need to remove or only have in output the values in df2 that match values in df1 i.e. remove/ drop rows for values '111', '222', '333' since they do not match with df1 values. let me know if this makes it clearer. – rb_27 Apr 21 '22 at 05:39
  • Please check dupe solutions, if some problem let me know. – jezrael Apr 21 '22 at 05:40

0 Answers0