Mapping columns from one dataframe to another to create a new column

Question

i have a dataframe

id  store    address
1    100        xyz
2    200        qwe
3    300        asd
4    400        zxc
5    500        bnm

i have another dataframe df2

serialNo    store_code  warehouse
    1          300         Land
    2          500         Sea
    3          100         Land
    4          200         Sea
    5          400         Land

I want my final dataframe to look like:

id  store    address  warehouse
1    100        xyz     Land
2    200        qwe     Sea
3    300        asd     Land
4    400        zxc     Land
5    500        bnm     Sea

i.e map from one dataframe onto another creating new column

score 65 · Accepted Answer · edited Jun 20 '20 at 09:12

`df.merge`

out = (df1.merge(df2, left_on='store', right_on='store_code')
          .reindex(columns=['id', 'store', 'address', 'warehouse']))
print(out)

   id  store address warehouse
0   1    100     xyz      Land
1   2    200     qwe       Sea
2   3    300     asd      Land
3   4    400     zxc      Land
4   5    500     bnm       Sea

`pd.concat` + `df.sort_values`

u = df1.sort_values('store')
v = df2.sort_values('store_code')[['warehouse']].reset_index(drop=1)
out = pd.concat([u, v], 1)

print(out)

   id  store address warehouse
0   1    100     xyz      Land
1   2    200     qwe       Sea
2   3    300     asd      Land
3   4    400     zxc      Land
4   5    500     bnm       Sea

The first sort call is redundant assuming your dataframe is already sorted on store, in which case you may remove it.

`df.replace`/`df.map`

s = df1.store.replace(df2.set_index('store_code')['warehouse'])
print(s) 
0    Land
1     Sea
2    Land
3    Land
4     Sea

df1['warehouse'] = s
print(df1)

   id  store address warehouse
0   1    100     xyz      Land
1   2    200     qwe       Sea
2   3    300     asd      Land
3   4    400     zxc      Land
4   5    500     bnm       Sea

Alternatively, create a mapping explicitly. This works if you want to use it later.

mapping = dict(df2[['store_code', 'warehouse']].values)
df1['warehouse'] = df1.store.map(mapping)
print(df1)

   id  store address warehouse
0   1    100     xyz      Land
1   2    200     qwe       Sea
2   3    300     asd      Land
3   4    400     zxc      Land
4   5    500     bnm       Sea

how is map with large amounts of data, e.g. data frames 5 to 10 million? I wonder if that dict will work efficiently. — DISC-O, Apr 01 '20 at 19:59
@DISC-O it depends on the data, but pandas generally does not work great at such scales of data. Think more along the lines of distributed processing eg dask. — cs95, Apr 02 '20 at 02:58
@Pablo It depends on your data, best is to test it with `%timeit` statements — cs95, Sep 26 '20 at 00:46

jezrael · Answer 2 · 2017-09-05T08:17:22.907

29

Use map or join:

df1['warehouse'] = df1['store'].map(df2.set_index('store_code')['warehouse'])
print (df1)
   id  store address warehouse
0   1    100     xyz      Land
1   2    200     qwe       Sea
2   3    300     asd      Land
3   4    400     zxc      Land
4   5    500     bnm       Sea

df1 = df1.join(df2.set_index('store_code'), on=['store']).drop('serialNo', 1)
print (df1)
   id  store address warehouse
0   1    100     xyz      Land
1   2    200     qwe       Sea
2   3    300     asd      Land
3   4    400     zxc      Land
4   5    500     bnm       Sea

edited Sep 05 '17 at 08:17

answered Sep 05 '17 at 07:55

jezrael

822,522
95
1,334
1,252

i'm getting this error, when running .map code in a similar dataset. `Reindexing only valid with uniquely valued Index objects` – Shubham R Sep 05 '17 at 09:01
1

I think there is problem you have duplicates in `store_code` in `df2`. so need `df1['store'].map(df2.drop_duplicates('store_code').set_index('store_code')['warehouse'])` – jezrael Sep 05 '17 at 09:05
1

correct! Thanks :) – Shubham R Sep 05 '17 at 09:14

Mapping columns from one dataframe to another to create a new column

2 Answers2

`df.merge`

`pd.concat` + `df.sort_values`

`df.replace`/`df.map`

Linked

Related

Mapping columns from one dataframe to another to create a new column

2 Answers2

df.merge

pd.concat + df.sort_values

df.replace/df.map

Linked

Related

`df.merge`

`pd.concat` + `df.sort_values`

`df.replace`/`df.map`