0

I have two dataframes:

df_one:

person  year   job 
AA      2010    x
AA      2013    y 
AA      2014    z
BG      2009    a
BG      2010    b
HJ      2001    c
HJ      2002    g

df_two:

person  year   city
AA      2010    NY
YH      2004    Minneapolis
AA      2013    Boston
AA      2014    LA
BG      2009    Chicago
IS      1998    Chicago
BG      2010    Des Moines
HJ      2001    Austin
HJ      2002    Nashville

df_one and df_two share a lot of the same people. Because df_one has the "jobs" column, I'd like to migrate the "city" column from df_two over to the appropriate people in df_one based on the person and year columns. In order to locate the correct person and year (and thus, the correct city), I've tried the following (in order to make a new column on df_one called "city"):

for i,row in df_one.iterrows():
    person = df_one.at[i,'person']
    year = df_one.at[i,'year']
    if df_one.loc[(df_two['person'] == person) & (df_two['year'] == year)]:
        df_one.at[i,'city'] = df_two['city']

I'm getting the error:

ValueError: The truth value of a DataFrame is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

Though I would like to fix this error, I suspect that there may be a better, more efficient way to accomplish what I'm trying to do.

AdamA
  • 343
  • 1
  • 2
  • 11

1 Answers1

1

In your case, you should subset the data using .loc and save it to a variable, something like this:

# make sure to add an empty column
df_one['city'] = None

for i,row in df_one.iterrows():
    person = df_one.at[i,'person']
    year = df_one.at[i,'year']
    f = df_two.loc[(df_two['person'] == person) & (df_two['year'] == year), 'city']
    print(f)
    if not f.empty:
        df_one.at[i,'city'] = f.values[0]

However, the optimal way to do this is to use merge function in pandas. You can simply do:

df_one.merge(df_two, on=['person','year'])
YOLO
  • 20,181
  • 5
  • 20
  • 40