-3

I have 2 Dataframes, fit and mass. They only have one similar column, 'CATAID'. The fit Dataframe contains information about the whole experiment. The mass one, however, only contains a small population of the experiment.

For my work, I need the information in the fit DataFrame, but for the 'CATAID' values in the mass Dataframe. I need to loop over the column values in fit and pick rows that match with CATAID values in mass.

I'm using the following loop,

file=pd.DataFrame()
for i in mass.index:
    cataid_m=mass.loc[i,'CATAID']
    for j in fit.index:
        cataid_f=fit.loc[j,'CATAID']
        if cataid_m==cataid_f:
            file[j]=fit.iloc[j]

My only concern is the amount of time this loop takes. I was wondering if anyone has any suggestions on how to improve this loop?

abhilb
  • 5,639
  • 2
  • 20
  • 26
Sheyda
  • 1
  • 1
  • 3
    This appears to be a direct DF merge, no? – Prune Dec 11 '19 at 23:13
  • Welcome to stack overflow! Please [edit] your question to provide a [mcve] for your issue, including sample input and output so that we can better understand the problem. Take a look at [How to make good pandas examples](https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples) – G. Anderson Dec 11 '19 at 23:13
  • no, the fit DF has the 'CATAID' values in the mass DF and some more that aren't in the mass DF. I'm not merging them, I'm selecting the 'CATAID' values in the fit DM that are the same as the 'CATAID' column in the mass DF and storing the rows for the selected 'CATAID' values in a new Dataframe. – Sheyda Dec 11 '19 at 23:22

1 Answers1

0

You can do this by first getting the ids in the mass dataframe

mass_id = mass_df['CATAID'].unique().tolist()

Then you can get the rows from your main dataframe, where the CATAID is inside mass_id:

relevant_df = fit_df.loc[fit_df['CATAID'].isin(mass_id)]

I don't think a merge works here as Prune comments, because we aren't trying to join these two dataframes. We are just trying to extract the ids from one dataframe, and get the rows which match those ids.

alex067
  • 3,159
  • 1
  • 11
  • 17