14

i am trying to join two data frames but cannot get my head around the possibilities Python has to offer.

First dataframe:

ID MODEL   REQUESTS ORDERS
1  Golf    123      4
2  Passat  34       5
3  Model 3 500      8
4  M3      5        0

Second dataframe:

MODEL   TYPE  MAKE
Golf    Sedan Volkswagen
M3      Coupe BMW
Model 3 Sedan Tesla

What I want is to add another column in the first dataframe called "make" so that it looks like this:

ID MODEL   MAKE       REQUESTS ORDERS
1  Golf    Volkswagen 123      4
2  Passat  Volkswagen 34       5
3  Model 3 Tesla      500      8
4  M3      BMW        5        0

I already looked at merge, join and map but all examples just appended the required information at the end of the dataframe.

jeff carey
  • 2,313
  • 3
  • 13
  • 17
Christian
  • 241
  • 1
  • 3
  • 8

4 Answers4

17

I think you can use insert with map by Series created with df2 (if some value in column MODEL in df2 is missing get NaN):

df1.insert(2, 'MAKE', df1['MODEL'].map(df2.set_index('MODEL')['MAKE']))
print (df1)
   ID    MODEL        MAKE  REQUESTS  ORDERS
0   1     Golf  Volkswagen       123       4
1   2   Passat         NaN        34       5
2   3  Model 3       Tesla       500       8
3   4       M3         BMW         5       0
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
  • Thank you @jezrael. Already thought about insert() and just saw your edit. – Christian Jan 06 '17 at 18:31
  • @jezrael: Thank you very much – Vineesh TP Sep 05 '19 at 20:38
  • 1
    @jezrael: A cool solution. And what should I do if I don't have unique values in df1, but they have be repeated? Because in that case I get an error: "Reindexing only valid with uniquely valued Index objects". – Tomasz Przemski Oct 21 '19 at 08:08
  • 1
    @TomaszPrzemski - Then counter with merge should be used, check [this](https://stackoverflow.com/questions/57406428/pandas-merge-df-many-to-many-without-duplicates/57406537#57406537) – jezrael Oct 21 '19 at 08:10
  • Does it matter if the 'MODEL' column name is different within the two dfs? Can I just use what it is called in df2 or do I have to rename it? – Yogesh Riyat Jul 03 '23 at 19:18
3

Although not in this case, but there might be scenarios where df2 has more than two columns and you would just want to add one out of those to df1 based on a specific column as key. Here is a generic code that you may find useful.

df = pd.merge(df1, df2[['MODEL', 'MAKE']], on = 'MODEL', how = 'left')
Bhagabat Behera
  • 853
  • 7
  • 7
  • Please format code as code. stackoverflow.com/help/formatting either use `, or make indent with 4 spaces, or mark your text and press ctrl+K. Also, ideally you should explain a bit what makes your answer good/ better than the others – tjebo Jun 24 '18 at 12:19
1

The join method acts very similarly to a VLOOKUP. It joins a column in the first dataframe with the index of the second dataframe so you must set MODEL as the index in the second dataframe and only grab the MAKE column.

df.join(df1.set_index('MODEL')['MAKE'], on='MODEL')

Take a look at the documentation for join as it actually uses the word VLOOKUP.

Ted Petrou
  • 59,042
  • 19
  • 131
  • 136
  • Thank you @Ted Petrou. I saw that the documentation actually has vlookup in it. I just missed to use the ().set_index method. – Christian Jan 06 '17 at 18:29
1

I always found merge to be an easy way to do this:

df1.merge(df2[['MODEL', 'MAKE']], how = 'left')

However, I must admit it would not be as short and nice if you wanted to call the new column something else than 'MAKE'.

Yona
  • 571
  • 7
  • 23