0

I have 2 CSV file like this , want to update the df1 columns (LL,UL) base on the df2(LL,UL) by matching columns (test ,cond) in the both dataframes
df1:

test Cond day  mode LL UL  
a    T1   Tue  7  
b    T2   mon  7  
c    T2   sun  6  
d    T3   fri  3  
c    T2   sat  6  
d    T3   wed  3  

df2:-

test Cond LL   UL  
a    T1   15   23  
b    T2   -3   -3.5  
c    T2   -19  -11  
d    T3   6.5  14.5  

my expected output should be:-

enter image description here

def SpecsLL(cond1,test1):
if ((cond1==spec['Cond'] ) & (test1==spec['test'])):
    return df2['LL']
df1['LL'] = df1.apply(lambda x: SpecsLL(x['Cond'],x['test']),axis=1)

i have tried above code but not working.
any ideas on how to do it??

Adept
  • 522
  • 3
  • 16
jagan k
  • 33
  • 7
  • Does this answer your question? [Pandas Merging 101](https://stackoverflow.com/questions/53645882/pandas-merging-101) – Anurag Dabas Jul 29 '21 at 09:35

2 Answers2

0

Method 1: combine_first

index_cols = ['test', 'Cond']
(
    df1
    .set_index(index_cols)
    .combine_first(
        df2.set_index(index_cols)
    ).reset_index()
)

Explanation:

  • set_index moves the specified columns to the index, indicating that each row should be identified by its test and Cond columns.
  • foo.combine_first(bar) will identify matching index + column labels between foo and bar, and fill in values from bar wherever foo is NaN or has a column/row missing. In this case, thanks to the set_index, the two dataframes will have their rows matched where test and Cond are the same, and then the UL and LL values from df2 will be filled in to the corresponding columns of the output.
  • reset_index simply reverses the set_index call, so that test and Cond become regular columns again.

Note that this operation might mangle the order of your columns, so if that is important to you then you can call .reindex(df1.columns, axis=1) at the very end, which will reorder the columns to original order in df1.

Method 2: merge

Alternatively you can use the merge method, which allows you to operate on the columns directly without using set_index, but will require some other preprocessing:

index_cols = ['test', 'Cond']
(
    df1
    .drop(['LL', 'UL'], axis=1)
    .merge(
       df2,
       on=index_cols
    )
)

The .drop call is necessary because otherwise merge will include the UL and LL columns from both DataFrames in the output:

  test Cond  day  mode  LL_x  UL_x  LL_y  UL_y
0    a   T1  Tue     7   NaN   NaN  15.0  23.0
1    b   T2  mon     7   NaN   NaN  -3.0  -3.5
2    c   T2  sun     6   NaN   NaN -19.0 -11.0
3    c   T2  sat     6   NaN   NaN -19.0 -11.0
4    d   T3  fri     3   NaN   NaN   6.5  14.5
5    d   T3  wed     3   NaN   NaN   6.5  14.5

Which to use?

With the data that you have provided, merge seems like the more natural operation - if you never expect UL and LL to have any data in df1, then if possible I'd recommend simply removing those column headers entirely from the input CSV, so that df1 doesn't have those columns at all. In that case, the drop call would no longer be necessary and the required merge call is very expressive.

However, if you expect that df1 would sometimes have real values for UL or LL, and you want to include those values in the output, then the combine_first solution is what you want. Note that if both df1 and df2 have different non-null values for a particular row/column, then the df1.combine_first(df2) will select the value from df1 and ignore the df2 value. If you instead wanted to prioritise the values from df2 then you want to call it the other way round, i.e. df2.combine_first(df1).

Dom
  • 180
  • 4
0

Simply use merge functionalities of pandas

df1.merge(df2)
Deven Ramani
  • 751
  • 4
  • 10