1

I have 4 csv files with different correlations (and different number of rows) and I am trying to create a new one with all correlations:

DF1: |Var1|Var2|Corr1|

DF2: |Var1|Var2|Corr2|

Objective:

|Var1|Var2|Corr1|Corr2|

(EDIT:

DF1:                                DF2:
Var1  Var2  Corr                    Var1   Var2  Corr2 
a     b      2                      a      b     3
a     s      1                      a      z     4
c     d      0                      c      d     2   

Spected Out:

DF1:                                
Var1  Var2  Corr  Corr2                  
a     b      2    3                   
a     s      1    NaN                   
c     d      0    2  
a     z      NaN  4                      

Finish edit)

The condition is if Var1 and Var2 are the same, add Corr2 value in new column (It would be also good to add new rows if they are new correlations in DF2)

I have this but is not working:

corr=[]
for row in DF1['Var1']:
        if DF1['Var2'] == DF2['Var2'] :    corr.append(DF2["Corr2"])
        
        ##else: add DF2["Var1]["Var2"]["Corr2"]

DF1["Corr2"]=corr
print(DF1)

Any idea to create this condition? Thank you

SERGIO
  • 92
  • 6
  • 1
    This would be easier to answer with an [mre] with an example input and output. I think `pd.merge(df1, df2, on=['Var1', 'Var2'])` might be along the lines of what you're looking for though. – Kraigolas Feb 03 '22 at 14:42
  • it sounds like this : https://stackoverflow.com/questions/33086881/merge-two-python-pandas-data-frames-of-different-length-but-keep-all-rows-in-out – Anynamer Feb 03 '22 at 14:42
  • Thank you I edited the question with a reproducible example Kraigolas and than you and Anynamer, that was the clue I was looking for – SERGIO Feb 03 '22 at 15:27

1 Answers1

1

Thank you for your ideas, merge works! Your clue make it more easier.

I must edit your code to get what I asked:

DF1:                                DF2:
Var1  Var2  Corr                    Var1   Var2  Corr2 
a     b      2                      a      b     3
a     s      1                      a      z     4
c     d      0                      c      d     2   

This only merge if ["Var1"] and ["Var2"] are the same and the result is only a Df with the ones in common:

pd.merge(df1, df2, on=['Var1', 'Var2']) 

Out:

                        
Var1  Var2  Corr  Corr2                   
a     b      2     3                                    
c     d      0     2                   

This merge the ones in common and keep DF1 rows and adds DF2 rows not in common:

result = DF1.merge(DF2, on=['Var1', 'Var2'], how= "outer") 

Out:

DF1:                                
Var1  Var2  Corr  Corr2                  
a     b      2    3                   
a     s      1    NaN                   
c     d      0    2  
a     z      NaN  4                      

SERGIO
  • 92
  • 6