0

I have two dataframes. The old df and the new df, they both contain information about a candidate's name and GPA, the old df contains more gpa than the new one and for the nan values in the gpa column of the newdf, I want to assign the corresponding GPA from the old df based on the candidate's name column(The 2 df have different shape so I can't do one line command). Here's the code I tried:

for y in df_new['NAME'].unique():
    for m in df_old['Name'].unique():
    
        if y==m:
            old_row=df_old.loc[df_old['Name']==m]
            new_row=df_new.loc[df_new['NAME']==y]
            if pd.isna(new_row['GPA'])=='True': 
                if pd.isna(old_row['GPA'])=='False':
                    new_row['GPA']=old_row['GPA']

This line "if pd.isna(new_row['GPA'])=='True':" has the error 'The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all()' But I believe that new_row['GPA'] only contains one boolean value, what is going on?

Teddy 547
  • 11
  • 1
  • `pd.isna(old_row['GPA'])=='False'` returns a series of booleans, not a single value. pandas considers it ambiguous to use `if` to test a series: if a single value is true? if all values are true? etc. – tdy Nov 19 '21 at 19:59
  • Thanks for your comment! But isn't old_row['GPA] the gpa for that row only? I understand if I do df_old['GPA'] then it'll return a series of booleans. But in this case, I think I'm only comparing one boolean. Did I mis-undertand something? – Teddy 547 Nov 19 '21 at 20:03
  • even if there's only one value, the result is still a `pd.Series` when you index with `loc`. my previous comment was not quite precise: pandas considers it ambiguous to use `if` on a `pd.Series` in general (regardless of how many values) – tdy Nov 19 '21 at 20:08
  • also note that looping is an antipattern in pandas. based on your description, you should be able to use a single [`merge`](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.merge.html) to replace this whole nested `for`/`if` loop – tdy Nov 19 '21 at 20:12
  • Thank you! That made things a lot clearer! I also have other columns other than name and gpa, and I know that I can merge on 'Name' but I am not sure it can specify that I only want to replace the gpa column in place. – Teddy 547 Nov 19 '21 at 20:27
  • I forgot to say that the GPA in the two df is extracted from resumes using different modules so they don't have the same value under the same name of the person. The new one is more reliable but extracted less gpas from the resumes we have, so I just want to get the old df's gpa to fill up the new ones if there is a nan. – Teddy 547 Nov 19 '21 at 20:31
  • pandas questions are hard to answer concretely (beyond general advice like "use `merge`") without some sample dataframes. try to provide [minimal sample dataframes](https://stackoverflow.com/q/20109391/13138364) of `old_df` and `new_df` that demonstrate the difficulty in merging them – tdy Nov 19 '21 at 20:35

0 Answers0