3

Let df1, df2, and df3 are pandas.DataFrame's having the same structure but different numerical values. I want to perform:

res=if df1>1.0: (df2-df3)/(df1-1) else df3

res should have the same structure as df1, df2, and df3 have.

numpy.where() generates result as a flat array.

Edit 1:

res should have the same indices as df1, df2, and df3 have.

For example, I can access df2 as df2["instanceA"]["parameter1"]["paramter2"]. I want to access the new calculated DataFrame/Series res as res["instanceA"]["parameter1"]["paramter2"].

Kadir
  • 1,345
  • 3
  • 15
  • 25

3 Answers3

2

Actually numpy.where should work fine there. Output here is 4x2 (same as df1, df2, df3).

df1 = pd.DataFrame( np.random.randn(4,2), columns=list('xy') )
df2 = pd.DataFrame( np.random.randn(4,2), columns=list('xy') )
df3 = pd.DataFrame( np.random.randn(4,2), columns=list('xy') )

res = df3.copy()
res[:] = np.where( df1 > 1, (df2-df3)/(df1-1), df3 )

          x         y
0 -0.671787 -0.445276
1 -0.609351 -0.881987
2  0.324390  1.222632
3 -0.138606  0.955993

Note that this should work on both series and dataframes. The [:] is slicing syntax that preserves the index and columns. Without that res will come out as an array rather than series or dataframe.

Alternatively, for a series you could write as @Kadir does in his answer:

res = pd.Series(np.where( df1>1, (df2-df3)/(df1-1), df3 ), index=df1.index)

Or similarly for a dataframe you could write:

res = pd.DataFrame(np.where( df1>1, (df2-df3)/(df1-1), df3 ), index=df1.index,
                                                              columns=df1.columns)
JohnE
  • 29,156
  • 8
  • 79
  • 109
1

Integrating the idea in this question into JohnE's answer, I have come up with this solution:

res = pd.Series(np.where( df1 > 1, (df2-df3)/(df1-1), df3 ), index=df1.index)

A better answer using DataFrames will be appreciated.

Community
  • 1
  • 1
Kadir
  • 1,345
  • 3
  • 15
  • 25
0

Say df is your initial dataframe and res is the new column. Use a combination of setting values and boolean indexing.

Set res to be a copy of df3:

 df['res'] = df['df3']

Then adjust values for your condition.

df[df['df1']>1.0]['res'] = (df['df2'] - df['df3'])/(df['df1']-1)
AZhao
  • 13,617
  • 7
  • 31
  • 54