1

I'm quite new to python and got the following problem.

I have two dataframes, where the first one looks like this:

df1
code  product
10-00  apple
10-10  banana
10-20  grape
10-00  cucumber
20-00  tomato
20-10  onion
20-10  garlic

and the second one looks like:

df2
code  colour
10-00  green
10-10  yellow
10-20  purple
20-00  red
20-10  white

I would like to have a loop that would give the following dataframe

df
10-00  apple  green
10-10  banana  yellow
10-20  grape  purple
10-00  cucumber  green
20-00  tomato  red
20-10  onion  white
20-10  garlic  white

But I really can't figure it out where to start.. Anybody who has experience with such a problem?

jpp
  • 159,742
  • 34
  • 281
  • 339
avibrun
  • 39
  • 1
  • 8

3 Answers3

1

Try this (pd.DataFrame.merge)

df = pd.merge(df1,df2,on=['code'],how='left')

Example:

import pandas as pd

df1 = pd.DataFrame({
    'code': ['10-00','10-10'],
    'product': ['apple','banana']
})

df2 = pd.DataFrame({
    'code': ['10-00','10-10'],
    'colour': ['green','yellow']
})

df = pd.merge(df1,df2,on=['code'],how='left')

print(df)

Returns:

    code product  colour
0  10-00   apple   green
1  10-10  banana  yellow
Anton vBR
  • 18,287
  • 5
  • 40
  • 46
Mohamed Thasin ah
  • 10,754
  • 11
  • 52
  • 111
0

Don't use a loop, index your df2 data-frames by the code column, the use simple assignment!

>>> df1.set_index('code', inplace=True)
>>> df2.set_index('code',inplace=True)
>>> df1
        product
code
10-00     apple
10-10    banana
10-20     grape
10-00  cucumber
20-00    tomato
20-10     onion
20-10    garlic
>>> df2
       colour
code
10-00   green
10-10  yellow
10-20  purple
20-00     red
20-10   white

Then simply:

>>> df1['colour'] = df2['colour']
>>> df1
        product  colour
code
10-00     apple   green
10-10    banana  yellow
10-20     grape  purple
10-00  cucumber   green
20-00    tomato     red
20-10     onion   white
20-10    garlic   white

If you'd rather not index df1 with code (it would give you a duplicate index) you can always use:

>>> df1['colour'] = df2.loc[df1['code']].values
>>> df1
    code   product  colour
0  10-00     apple   green
1  10-10    banana  yellow
2  10-20     grape  purple
3  10-00  cucumber   green
4  20-00    tomato     red
5  20-10     onion   white
6  20-10    garlic   white

As long as df2 is indexed by 'code'

juanpa.arrivillaga
  • 88,713
  • 10
  • 131
  • 172
0

This is possible with set_index and join:

df1.set_index('code').join(df2.set_index('code')).reset_index()

Result

    code   product  colour
0  10-00     apple   green
1  10-00  cucumber   green
2  10-10    banana  yellow
3  10-20     grape  purple
4  20-00    tomato     red
5  20-10     onion   white
6  20-10    garlic   white

Explanation

  • set_index('code') is applied to both df1 and df2 so that we can use it later for joining.
  • join is applied as a "left join" on index by default.
  • reset_index is applied to result in order to retrieve a dataframe with the required columns.
jpp
  • 159,742
  • 34
  • 281
  • 339