Consider a 100x200 dataframe (called df1) representing clinical data from 100 patients. Each patient can be identified through one number in column "ID" and another number in column 'CENTER'. Now, consider a second 40*170 dataframe df2 containing data from a subset of 40 patients randomly selected from df1, and tested 6 months after on different variables. Similar to df1, df2 contains columns 'ID' and 'CENTER'. I am trying to select these 40 patients in df1 based on their ID and CENTER numbers, but can't find an easy way to do so using Pandas. Any idea ?
Asked
Active
Viewed 1,037 times
1 Answers
1
You could try this:
df3 = df1[df1.ID.isin(df2.ID) & df1.CENTER.isin(df2.CENTER)]
-
This doesn't work. For example, df3 has a patient with ID = 4 and CENTER = 1, which is not in df2. The issue here is that each patient is defined by a specific pair of number defined by both ID and CENTER – user1363251 Jul 14 '21 at 20:05
-
I tried this with small data frame and it worked well with me. Did you try it with your data? Or at least provide us with a small amount of data to work with. – Jul 14 '21 at 20:27