How to transform and combine two DataFrames

Question

I'm trying to assign dataframe columns to the row and column of the .loc function. I have a DataFrame df with no set index, with Sites and Visits as my column headers

Index	Site	Visit
0	101	Visit 1
1	102	Visit 1
2	102	Visit 2

I have another dataframe df2 with Sites as my dataframe index, and Visits as my columns with Cost as my values.

Index	Visit 1	Visit 2
101	50	60
102	100	120

I'm trying to use .loc to index the Cost from df2 and add it as a column to df1 like so

Index	Site	Visit	Cost
0	101	Visit 1	50
1	102	Visit 1	60
2	102	Visit 2	120

I tried using the following code to provide a row and column value

df['Cost'] = df2.loc[df['Site'],df['Visit']]

But I got the following error:

KeyError: "Passing list-likes to .loc or [] with any missing labels is no longer supported. 
The following labels were missing: etc. etc.

Any idea how to use two column values from one DataFrame as the .loc row and column values?

Trenton McKinney · Accepted Answer · 2021-02-03T19:52:54.530

This is a question about merging, but the dataframes should be transformed, in order to properly merge them.

Transform the dataframes

Transform df2 by stacking 'Visit 1' and 'Visit 2' into a single column, with the values in an adjacent column
Technically, the columns do not need to be renamed, as is done below. However, I think it makes it easier when creating the .merge()

import pandas as pd

# create df1
df1 = pd.DataFrame({'Index': ['0', '1', '2'], 'Site': ['101', '102', '102'], 'Visit': ['Visit 1', 'Visit 1', 'Visit 2']})

# drop the Index column
df1.drop(columns=['Index'], inplace=True)

# display(df1)
  Site    Visit
0  101  Visit 1
1  102  Visit 1
2  102  Visit 2

# create df2
df2 = pd.DataFrame({'Index': ['101', '102'], 'Visit 1': ['50', '100'], 'Visit 2': ['60', '120']})

# display(df2)
  Index Visit 1 Visit 2
0   101      50      60
1   102     100     120

# stack Visit 1 and Visit 2 into a single column
df2 = df2.set_index('Index').stack().reset_index()

# rename the columns
df2.columns = ['Site', 'Visit', 'Cost']

# display(df2)
  Site    Visit Cost
0  101  Visit 1   50
1  101  Visit 2   60
2  102  Visit 1  100
3  102  Visit 2  120

Merge the dataframes

See Pandas Merging 101

cost = df1.merge(df2, on=['Site', 'Visit'])

# display(cost)
  Site    Visit Cost
0  101  Visit 1   50
1  102  Visit 1  100
2  102  Visit 2  120

score 2 · Answer 2 · answered Feb 03 '21 at 03:48

you can also use DataFrame.join with df2.stack's index to achieve the target.

df1 = df1.set_index('Index')
df2 = df2.set_index('Index')
df_reuslt = df1.join(df2.stack().rename('Cost'),
                     on=['Site', 'Visit'])

print(df_reuslt)

#            Site    Visit  Cost
#     Index                     
#     0       101  Visit 1    50
#     1       102  Visit 1   100
#     2       102  Visit 2   120

How to transform and combine two DataFrames

2 Answers2

Transform the dataframes

Merge the dataframes