I have a dataframe that looks like this:
idx a b c d e f g h i j
1 0 17 17 83 17 0 21 16 21 4
2 -9 31 31 74 40 0 39 39 39 9
3 -27 0 -27 92 27 -37 3 -37 40 16
4 -4 0 -4 81 4 -1 5 5 6 9
I'd like to apply:
where x>0: functionA(x)
where x<0: functionB(x)
What I've tried independently:
df[df>0] = np.log(df)
and
df[df<0] = -np.log(-df)
Which kinda seems to work.. Running these two ops sequentially will not work because the dataframe converts from int to float after the first operation and renders original values un-differentiable from log values, ex. is it a 0 or log(1) = 0 ?
I'm also concerned about these errors:
Divide by zero
usr/local/anaconda3/envs/ds/lib/python3.6/site-packages/ipykernel_launcher.py:1: RuntimeWarning: divide by zero encountered in log
"""Entry point for launching an IPython kernel.```
Invalid value
/usr/local/anaconda3/envs/ds/lib/python3.6/site-packages/ipykernel_launcher.py:1: RuntimeWarning: invalid value encountered in log
"""Entry point for launching an IPython kernel.
Which shouldn't occur because there are no NaN
values and I'm explicitly selecting non zero values.
df.isnull().values.any()
False
The final issue is how to do this efficiently as I'm working with billions of rows.