2

I have a dataframe df1:

df1 = pd.DataFrame({'Name': ['A', 'B', 'C', 'A', 'D'], 
                    'Qty': [1, 3, 5, 6, 1]})

that looks like this

       Name  Qty
    0    A    1
    1    B    3
    2    C    5
    3    A    6
    4    D    1

I need to create a new column 'Rate' in df1 that contains values from df2 such that the match is made both by Qty and Name.

df2 = pd.DataFrame({'Qty': [1, 2, 3, 4, 5, 6],
                    'A': [1.0, 1.2, 1.4, 1.6, 1.8, 2.0],
                    'B': [1.1, 1.2, 1.3, 1.4, 1.5, 1.6],
                    'C': [0.5, 1.0, 1.5, 2.0, 2.2, 3.0],
                    'D': [1.4, 1.5, 1.6, 1.7, 1.8, 1.9]})

that looks like

        Qty  A    B    C    D
    0    1  1.0  1.1  0.5  1.4
    1    2  1.2  1.2  1.0  1.5
    2    3  1.4  1.3  1.5  1.6
    3    4  1.6  1.4  2.0  1.7
    4    5  1.8  1.5  2.2  1.8
    5    6  2.0  1.6  3.0  1.9

Resulting df3 should look like this

df3    
      Name  Qty  Rate
    0    A    1   1.0
    1    B    3   1.3
    2    C    5   2.2
    3    A    6   2.0
    4    D    1   1.4

Baically search df2 with 'coordinates' given in df1 and return the value.

2 Answers2

3

You can use melt() followed by merge() to do this.

Code Example:

df2_reformated = df2.melt(id_vars=['Qty'], var_name='Name', value_name='Rate')
df3 = df1.merge(df2_reformated)

Output:

  Name  Qty  Rate
0    A    1   1.0
1    B    3   1.3
2    C    5   2.2
3    A    6   2.0
4    D    1   1.4

This works by converting the df2 variable from 'wide' to 'long format. For example, the first few rows of the dataframe, after melting, would look like this:

    Qty Name  Rate
0     1    A   1.0
1     2    A   1.2
2     3    A   1.4

Once you have it in this format, you can use merge() to combine it with df1. merge() works by looking at the columns in common between two dataframes, and combining rows which match in those columns.

Nick ODell
  • 15,465
  • 3
  • 32
  • 66
3

Another way:

(i) create a MultiIndex from df1: idx

(ii) set_index of df2 to Qty and unstack. This creates a MultiIndex Series. Using idx filter the relevant rows:

idx = df1.set_index(['Name','Qty']).index
out = df2.set_index('Qty').unstack().loc[idx].reset_index().rename(columns={0:'Rate'})

Output:

  Name  Qty  Rate
0    A    1   1.0
1    B    3   1.3
2    C    5   2.2
3    A    6   2.0
4    D    1   1.4