1

I'm trying to assign dataframe columns to the row and column of the .loc function. I have a DataFrame df with no set index, with Sites and Visits as my column headers

Index Site Visit
0 101 Visit 1
1 102 Visit 1
2 102 Visit 2

I have another dataframe df2 with Sites as my dataframe index, and Visits as my columns with Cost as my values.

Index Visit 1 Visit 2
101 50 60
102 100 120

I'm trying to use .loc to index the Cost from df2 and add it as a column to df1 like so

Index Site Visit Cost
0 101 Visit 1 50
1 102 Visit 1 60
2 102 Visit 2 120

I tried using the following code to provide a row and column value

df['Cost'] = df2.loc[df['Site'],df['Visit']]

But I got the following error:

KeyError: "Passing list-likes to .loc or [] with any missing labels is no longer supported. 
The following labels were missing: etc. etc. 

Any idea how to use two column values from one DataFrame as the .loc row and column values?

Henry Ecker
  • 34,399
  • 18
  • 41
  • 57
Oliver
  • 281
  • 3
  • 14

2 Answers2

4
  • This is a question about merging, but the dataframes should be transformed, in order to properly merge them.

Transform the dataframes

  • Transform df2 by stacking 'Visit 1' and 'Visit 2' into a single column, with the values in an adjacent column
  • Technically, the columns do not need to be renamed, as is done below. However, I think it makes it easier when creating the .merge()
import pandas as pd

# create df1
df1 = pd.DataFrame({'Index': ['0', '1', '2'], 'Site': ['101', '102', '102'], 'Visit': ['Visit 1', 'Visit 1', 'Visit 2']})

# drop the Index column
df1.drop(columns=['Index'], inplace=True)

# display(df1)
  Site    Visit
0  101  Visit 1
1  102  Visit 1
2  102  Visit 2

# create df2
df2 = pd.DataFrame({'Index': ['101', '102'], 'Visit 1': ['50', '100'], 'Visit 2': ['60', '120']})

# display(df2)
  Index Visit 1 Visit 2
0   101      50      60
1   102     100     120

# stack Visit 1 and Visit 2 into a single column
df2 = df2.set_index('Index').stack().reset_index()

# rename the columns
df2.columns = ['Site', 'Visit', 'Cost']

# display(df2)
  Site    Visit Cost
0  101  Visit 1   50
1  101  Visit 2   60
2  102  Visit 1  100
3  102  Visit 2  120

Merge the dataframes

cost = df1.merge(df2, on=['Site', 'Visit'])

# display(cost)
  Site    Visit Cost
0  101  Visit 1   50
1  102  Visit 1  100
2  102  Visit 2  120
Trenton McKinney
  • 56,955
  • 33
  • 144
  • 158
2

you can also use DataFrame.join with df2.stack's index to achieve the target.

df1 = df1.set_index('Index')
df2 = df2.set_index('Index')
df_reuslt = df1.join(df2.stack().rename('Cost'),
                     on=['Site', 'Visit'])

print(df_reuslt)

#            Site    Visit  Cost
#     Index                     
#     0       101  Visit 1    50
#     1       102  Visit 1   100
#     2       102  Visit 2   120
Ferris
  • 5,325
  • 1
  • 14
  • 23