-2

Let's take a simple example. I have this first dataframe :

df = pd.DataFrame(dict(Name=['abc','def','ghi'],NoMatter=['X','X','X']))
df
  Name NoMatter
0  abc        X
1  def        X
2  ghi        X

For some reasons, I would like to use a for loop which add a column Value to df and do some treatments, from another dataframe changing at each iteration :

# strucutre of for loop I would like to use :
for i in range(something) :
    add the column Value to df from df_value
    other treatment not usefull here

# appearance of df_value (which change at each iteration of the for loop) :
  Name  Value
0  abc      1
1  def      2
2  ghi      3

However, I would prefer not to use merging, because that would require to delete the column Value added in the previous iteration before adding the one of the current iteration. Is there please a way to add the Value column to df by just an assignment starting like that :

df['Value'] = XXX

Expected output :

  Name NoMatter  Value
0  abc        X      1
1  def        X      2
2  ghi        X      3

[EDIT]

I don't want to use merging because at the fourth iteration of the for loop, df would have the columns :

Name NoMatter Value1 Value2 Value3 Value4

Whereas I just want to have :

Name NoMatter Value4

I could delete the previous column each time but it seems not to be very efficient. This is why I'm just looking for a way to assign values to the Value column, not adding the column. Like an equivalent of the vlookup function in Excel applied to df from df_value data.

Ewdlam
  • 875
  • 9
  • 28
  • 1
    what is your expected output? It is not clear. – David Erickson Jul 16 '20 at 09:11
  • @Serge Ballesta, I just want to add the Value column to df, without any other treatment (I spoke about some treatments in the post to justify the use of the for loop). Thanks for your help – Ewdlam Jul 16 '20 at 09:17
  • Does this answer your question? [How to join (merge) data frames (inner, outer, left, right)](https://stackoverflow.com/questions/1299871/how-to-join-merge-data-frames-inner-outer-left-right) – David Erickson Jul 16 '20 at 09:18
  • if you only want to merge specific columns, then do `df2 = pd.merge(df[['Name','Value']], df1, how='left',on='Name'])` – David Erickson Jul 16 '20 at 09:20
  • @David Erickson thanks but no : if I do that, in the second iteration I will have the Column value of the first iteration AND the second iteration, I would like to replace the value column. I could use merging and then delete the previous value column but it is not really efficient – Ewdlam Jul 16 '20 at 09:20
  • not sure what you mean by iteration. A simple merge would get you from input to expected output without having to delete columns. Your question is not clear. – David Erickson Jul 16 '20 at 09:22
  • Thanks for your feedback, I will edit the post – Ewdlam Jul 16 '20 at 09:24
  • You really should show a [mcve] with the loop you use... – Serge Ballesta Jul 16 '20 at 09:38
  • @Serge Ballesta it's done – Ewdlam Jul 16 '20 at 09:42

4 Answers4

0

3 ways to join dataframes

df1.append(df2) # Adds the rows in df1 to the end of df2 (columns should be identical)

pd.concat([df1, df2],axis=1) # Adds the columns in df1 to the end of df2 (rows should be identical)

df1.join(df2,on=col1,how='inner') # SQL-style joins the columns in df1 with the columns on df2 where the rows for col have identical values. how can be one of 'left', 'right',

Samuel Lawrence
  • 253
  • 2
  • 12
  • Thanks for your help but in all of these cases, I will need to delete the previous value column in each iteration – Ewdlam Jul 16 '20 at 09:14
0

Here's the solution for your problem.

import pandas as pd
df = pd.DataFrame(dict(Name=['abc','def','ghi'],NoMatter=['X','X','X']))
df1 = pd.DataFrame(dict(Name=['abc','def','ghi'],Value=[1,2,3]))
new_df=pd.merge(df, df1, on='Name')
new_df
Umer Rana
  • 148
  • 6
0

The correct way is @UmerRana's answer, because iterating over a dataframe has terrible performances. If you really have to do it, it is possible to address an individual cell, but never pretend I advise you to do so:

df = pd.DataFrame(dict(Name=['abc','def','ghi'],NoMatter=['X','X','X']))
df1 = pd.DataFrame(dict(Name=['abc','def','ghi'],Value=[1,2,3]))
df['Value'] = 0    # initialize a new column of integers (hence the 0)
ix = df.columns.get_loc('Value')
for i in range(len(df)):                    # perf is terrible!
    df.iloc[i, ix] = df1['Value'][i]

After seeing your example code, and if you cannot avoid the loop, I thing that this would be the less bad way:

newcol = np.zeros(something, dtype='int')  # set the correct type
for i in range(something):
    #compute a value
    newcol[i] = value_for_i_iteration
df['Value'] = newcol                       # assign the array to the new column
Serge Ballesta
  • 143,923
  • 11
  • 122
  • 252
0

Maybe not the best way, but this solution works and replaces at each iteration the Value column (no need to delete the Value column before each new iteration) :

# similar to Excel vlookup function
def vlookup(df,ref,col_ref,col_goal):
    return pd.DataFrame(df[df.apply(lambda x: ref == x[col_ref],axis=1)][col_goal]).iloc[0,0]

df['Value'] = df['Name'].apply(lambda x : vlookup(df_value,x,'Name','Value'))

#Output : 

  Name NoMatter  Value
0  abc        X      1
1  def        X      2
2  ghi        X      3

Ewdlam
  • 875
  • 9
  • 28