0

I have two dataframes, df1 and df2. df1 has two columns 'A' and 'B'. Df2 has one column 'A'. Column A in df1 and column A in df2 contain names of genes, and I need to keep the rows in df1 if they include the genes listed in column A of df2, and get rid of the rest.

Example:

a = d = {'A': ['abcd', 'egfh', 'ijkl', 'mnop'], 'B': [3, 4, 5, 6]}
b = {'A': ['abcd', 'egfh', 'ijkl']}

df1 = pd.DataFrame(a)
print(df1)
df2 = pd.DataFrame(b)
print(df2)

The result i'm looking for is df1 containing only the first three rows.

df1.isin(df2)

I've tried using the isin function but i'm failing to get this to work. Maybe isin is not the right function to use when the column and row numbers are not the same?

BioProg
  • 153
  • 2
  • 11

0 Answers0