Select rows in a panda dataframe based on condition from another dataframe with a different size

Question

Consider a 100x200 dataframe (called df1) representing clinical data from 100 patients. Each patient can be identified through one number in column "ID" and another number in column 'CENTER'. Now, consider a second 40*170 dataframe df2 containing data from a subset of 40 patients randomly selected from df1, and tested 6 months after on different variables. Similar to df1, df2 contains columns 'ID' and 'CENTER'. I am trying to select these 40 patients in df1 based on their ID and CENTER numbers, but can't find an easy way to do so using Pandas. Any idea ?

score 1 · Answer 1 · answered Jul 14 '21 at 18:48

1

You could try this:

df3 = df1[df1.ID.isin(df2.ID) & df1.CENTER.isin(df2.CENTER)]

answered Jul 14 '21 at 18:48

This doesn't work. For example, df3 has a patient with ID = 4 and CENTER = 1, which is not in df2. The issue here is that each patient is defined by a specific pair of number defined by both ID and CENTER – user1363251 Jul 14 '21 at 20:05
I tried this with small data frame and it worked well with me. Did you try it with your data? Or at least provide us with a small amount of data to work with. – Jul 14 '21 at 20:27

Select rows in a panda dataframe based on condition from another dataframe with a different size

1 Answers1