3

I have two dataframes as follows:

df1

Name    Id   c1  c2  c3  c4
---------------------------
asd     101  a   b   c   d
cdf     231  e   ?   1  
zxs     342  f   o      
ygg     521  g   k   p  
mlk     432  h   m       z
abc     343  c   x   q  
xyz     254  1   d   2  
fgg     165  c   z   d   mm
mnd     766  2   d   v  

df2

df2_Name    df2_Id  df2_c2  df2_c4
----------------------------------
asd          101      h      d2
ygg          521      x      cd
fgg          165      o      cm

I want to match the "Name" and "id" from df1 with "df2_Name" and "df2_id" of df2. Wherever a match is found, the values of "c2" and "c4" in df1 are replaced by the values in "df2_c2" and "df2_c4" from df2.

Desired output

Name    Id    c1    c2  c3  c4
-------------------------------
asd     101    a    h   c   d2
cdf     231    e    ?   1   
zxs     342    f    o       
ygg     521    g    x   p   cd
mlk     432    h    m       z
abc     343    c    x   q   
xyz     254    1    d   2   
fgg     165    c    o   d   cm
mnd     766    2    d   v   

Tried solution 1

df1[df1.set_index(['Name', 'id']).index.isin(df2.set_index(['df2_Name','df2_id']).index)].iloc[:,[3,5]].update(df2.iloc[:,[2,3]]) 

Result: Original df1 is returned as it is.

Tried solution 2

df1.loc[df1.set_index(['Name', 'id']).index.isin(df2.set_index(['df2_Name','df2_id']).index), ['c2', 'c4']] = df2[['df2_c2', 'df2_c4']]

Result: NaNs introduced

Name    id   c1 c2  c3  c4
----------------------------
asd     101  a  NaN c   NaN
cdf     231  e  ?   1   
zxs     342  f  o       
ygg     521  g  NaN p   NaN
mlk     432  h  m       z
abc     343  c  x   q   
xyz     254  1  d   2   
fgg     165  c  NaN d   NaN
mnd     766  2  d   v   

Tried solution 3 (for c2 only)

merged = df1.merge(df2, left_on=["id", "Name"], right_on=["df2_id", "df2_Name"])

merged["c2"] = merged.apply(lambda x: x["c2"] if pd.isnull(x["df2_c2"]) else x["df2_c2"], axis=1)

Result:

Name    id    c1 c2 c3  c4  df2_Name    df2_id  df2_c2  df2_c4
--------------------------------------------------------------
asd     101   a   h c   d   asd         101      h       d2
ygg     521   g   x p       ygg         521      x       cd
fgg     165   c   o d   mm  fgg         165      o       cm

This solution 3 replaces the values of selected columns, however it returns the merged dataframe and not the entire df1 with updates.

Can anyone help me with this problem?

Note:

This question is being asked after trying the solutions in the following, however there is no success:

  1. update-a-pandas-dataframe-with-data-from-another-dataframe
  2. replace-column-values-based-on-another-dataframe-python-pandas-better-way
Hanif
  • 377
  • 4
  • 19
  • can you include your desired output to corroborate the correctness of our answers?, also provide sample data, not images – Yuca Aug 12 '19 at 13:25
  • added desired output now. – Hanif Aug 12 '19 at 13:32
  • 1
    Welcome to SO. Please review [ask] and create a [mcve]. If you had read the material recommended to you before you posted, you would notice that this explicitly states ***Do not include images of code***. That includes your sample Dataframes. – user3483203 Aug 12 '19 at 13:34
  • 1
    Possible duplicate of [pandas: merge (join) two data frames on multiple columns](https://stackoverflow.com/questions/41815079/pandas-merge-join-two-data-frames-on-multiple-columns) – Yuca Aug 12 '19 at 13:36
  • Yuca, this is not a simple merge operation. Its a partial replace operation where the tricky part is not to merge on columns but to replace some of the columns values with new values. – Hanif Aug 12 '19 at 13:56
  • yes, and to know how to update you merge first and then replace. Final step is to drop undesired columns – Yuca Aug 12 '19 at 14:07
  • Yuca, same was suggested in the answer below and i tried it but the resulting dataframe is a merged version which is a subset of df1 and not df1 with all the rows and columns. That's why i think merge first is not a better solution. – Hanif Aug 12 '19 at 14:15

2 Answers2

2

I would use merge to join the two dataframes. Then you got colums with your old values and a column with new values and nan values. Afterwards use apply to join these columns:

merged = df1.merge(df2, how='outer', left_on=["id", "name"], right_on=["df2_id", "df2_name"])
merged["c2"] = merged.apply(lambda x: x["c2"] if pd.isnull(x["df2_c2"]) else x["df2_c2"], axis=1)
# Same for c4
# Drop df2_c2 and df2_c4

I currently cant test it so let me know if this works for you.

Hanif
  • 377
  • 4
  • 19
  • tried your solution. It can replace the values of C2 in df1 using the df2_c2 values, however the resulting merge dataframe is not df1 but a resulting dataframe after merging df1 and df2. I would like to get back df1 after the replacement has been done. – Hanif Aug 12 '19 at 14:02
  • Im not sure if i get the problem. Does dropping the not needed row solve your issue? – ArsenieBoca Aug 12 '19 at 14:36
  • please compare my desired result (above) and your solution (solution3 above). I am interested in getting the whole df1 with updated as well as other rows and columns. Your solution results in producing a merge dataframe of df1 and df2 with the updated columns only. – Hanif Aug 12 '19 at 14:51
  • ArsenieBoca, I added the option of doing an outer merge to your suggested solution and then it works! – Hanif Aug 12 '19 at 15:01
  • 1
    Great, i updated the answer. Thats what happens when you dont test your solutions ^^ – ArsenieBoca Aug 12 '19 at 17:52
0
# Excel file name df1_df2.xlsx with 2 sheets name df1 & df2
# In df2 the column names are 'Name' 'Id' 'c_2' 'c_4'
# In df1 the column names are 'Name' 'Id' 'c1'  'c2' 'c3' 'c4'
import pandas as pd
import openpyxl
import xlsxwriter
url = "df1_df2.xlsx"
df = pd.ExcelFile(url) 
df1 = df.parse('df1')
df2 = df.parse('df2')
merged = pd.merge(df1,df2, how='outer', on=['Id'])
merged["c2"] = merged.apply(lambda x: x["c2"] if pd.isnull(x["c_2"])   
else x["c_2"], axis=1)
merged.reindex(['Name','Id','c1','c2','c3','c4'], axis=1)
  • Your answer could be improved with additional supporting information. Please [edit] to add further details, such as citations or documentation, so that others can confirm that your answer is correct. You can find more information on how to write good answers [in the help center](/help/how-to-answer). – Community May 24 '22 at 17:27