Alternative way to merge two dataframes in python

Question

Let's take a simple example. I have this first dataframe :

df = pd.DataFrame(dict(Name=['abc','def','ghi'],NoMatter=['X','X','X']))
df
  Name NoMatter
0  abc        X
1  def        X
2  ghi        X

For some reasons, I would like to use a for loop which add a column Value to df and do some treatments, from another dataframe changing at each iteration :

# strucutre of for loop I would like to use :
for i in range(something) :
    add the column Value to df from df_value
    other treatment not usefull here

# appearance of df_value (which change at each iteration of the for loop) :
  Name  Value
0  abc      1
1  def      2
2  ghi      3

However, I would prefer not to use merging, because that would require to delete the column Value added in the previous iteration before adding the one of the current iteration. Is there please a way to add the Value column to df by just an assignment starting like that :

df['Value'] = XXX

Expected output :

  Name NoMatter  Value
0  abc        X      1
1  def        X      2
2  ghi        X      3

[EDIT]

I don't want to use merging because at the fourth iteration of the for loop, df would have the columns :

Name NoMatter Value1 Value2 Value3 Value4

Whereas I just want to have :

Name NoMatter Value4

I could delete the previous column each time but it seems not to be very efficient. This is why I'm just looking for a way to assign values to the Value column, not adding the column. Like an equivalent of the vlookup function in Excel applied to df from df_value data.

@Serge Ballesta, I just want to add the Value column to df, without any other treatment (I spoke about some treatments in the post to justify the use of the for loop). Thanks for your help — Ewdlam, Jul 16 '20 at 09:17
Does this answer your question? [How to join (merge) data frames (inner, outer, left, right)](https://stackoverflow.com/questions/1299871/how-to-join-merge-data-frames-inner-outer-left-right) — David Erickson, Jul 16 '20 at 09:18
if you only want to merge specific columns, then do `df2 = pd.merge(df[['Name','Value']], df1, how='left',on='Name'])` — David Erickson, Jul 16 '20 at 09:20
@David Erickson thanks but no : if I do that, in the second iteration I will have the Column value of the first iteration AND the second iteration, I would like to replace the value column. I could use merging and then delete the previous value column but it is not really efficient — Ewdlam, Jul 16 '20 at 09:20
not sure what you mean by iteration. A simple merge would get you from input to expected output without having to delete columns. Your question is not clear. — David Erickson, Jul 16 '20 at 09:22

score 0 · Answer 1 · answered Jul 16 '20 at 09:13

0

3 ways to join dataframes

df1.append(df2) # Adds the rows in df1 to the end of df2 (columns should be identical)

pd.concat([df1, df2],axis=1) # Adds the columns in df1 to the end of df2 (rows should be identical)

df1.join(df2,on=col1,how='inner') # SQL-style joins the columns in df1 with the columns on df2 where the rows for col have identical values. how can be one of 'left', 'right',

answered Jul 16 '20 at 09:13

Samuel Lawrence

253
2
12

Thanks for your help but in all of these cases, I will need to delete the previous value column in each iteration – Ewdlam Jul 16 '20 at 09:14

score 0 · Answer 2 · answered Jul 16 '20 at 09:16

0

Here's the solution for your problem.

import pandas as pd
df = pd.DataFrame(dict(Name=['abc','def','ghi'],NoMatter=['X','X','X']))
df1 = pd.DataFrame(dict(Name=['abc','def','ghi'],Value=[1,2,3]))
new_df=pd.merge(df, df1, on='Name')
new_df

answered Jul 16 '20 at 09:16

Umer Rana

148
6

Thanks for your help but using merging I will need to delete the previous value column in each iteration – Ewdlam Jul 16 '20 at 09:18
I'm not understanding what you mean be delete previous value column in each iteration. Could you send your problem what you are facing by applying this? – Umer Rana Jul 16 '20 at 09:21
`del df['Value']` del column at the end of loop before next iteration – Umer Rana Jul 16 '20 at 09:34
It is the way I suggested in my post, but I'm looking a more efficient way. Merge then delete isn't really efficient – Ewdlam Jul 16 '20 at 09:36
So then there's a way to implement it by if statement. If this column alread there replace it with new – Umer Rana Jul 16 '20 at 09:37
Such a if statement could be a solution indeed – Ewdlam Jul 16 '20 at 09:50
@Ewdlam Yes...! – Umer Rana Jul 16 '20 at 10:54

Serge Ballesta · Answer 3 · 2020-07-16T09:48:37.133

The correct way is @UmerRana's answer, because iterating over a dataframe has terrible performances. If you really have to do it, it is possible to address an individual cell, but never pretend I advise you to do so:

df = pd.DataFrame(dict(Name=['abc','def','ghi'],NoMatter=['X','X','X']))
df1 = pd.DataFrame(dict(Name=['abc','def','ghi'],Value=[1,2,3]))
df['Value'] = 0    # initialize a new column of integers (hence the 0)
ix = df.columns.get_loc('Value')
for i in range(len(df)):                    # perf is terrible!
    df.iloc[i, ix] = df1['Value'][i]

After seeing your example code, and if you cannot avoid the loop, I thing that this would be the less bad way:

newcol = np.zeros(something, dtype='int')  # set the correct type
for i in range(something):
    #compute a value
    newcol[i] = value_for_i_iteration
df['Value'] = newcol                       # assign the array to the new column

score 0 · Accepted Answer · answered Jul 16 '20 at 11:19

Maybe not the best way, but this solution works and replaces at each iteration the Value column (no need to delete the Value column before each new iteration) :

# similar to Excel vlookup function
def vlookup(df,ref,col_ref,col_goal):
    return pd.DataFrame(df[df.apply(lambda x: ref == x[col_ref],axis=1)][col_goal]).iloc[0,0]

df['Value'] = df['Name'].apply(lambda x : vlookup(df_value,x,'Name','Value'))

#Output : 

  Name NoMatter  Value
0  abc        X      1
1  def        X      2
2  ghi        X      3

Alternative way to merge two dataframes in python

4 Answers4