logic element-wise operations in pandas time-series dataframe

Question

I have a pandas DataFrame with a timestamp datetime index and values that correspond to each date. E.g., df = pd.DataFrame(['0.11', '0.07', '0.04', '-0.11', '-0.04', '0.08', '0.1'], index=['2017-01-01', '2017-01-02', '2017-01-03', '2017-01-04', '2017-01-05', '2017-01-06', '2017-01-07'], columns=['values']).

I would like to create an additional column (let's call it 'new_value') based on the present and historical values of the above dataframe.

The logic should be:

if a value is greater than or equal to 0.1, 'new_value' should be set to -1,
once 'new_value' is set to -1, it should remain -1 until a value less than or equal to 0.05 is registered,
if a value is less than or equal to -0.1, 'new_value' should be set to +1,
once 'new_value' is set to +1, it should remain +1 until a value greater than or equal to -0.05 is registered,
otherwise 'new_value' is equal to 0

I have tried multiple solutions, but can't seem to solve this problem. E.g.,

new_frame = pd.DataFrame(np.zeros(len(df.index),index=df.index,columns=['new_value'])
for date in df.index:
    if df['value'][date.strftime('%Y-%m-%d')] > 0.1:
        new_frame.set_value(date.strftime("%Y-%m-%d"),'new_value',-1)

But I receive the error: 'ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().'

If I then change the third line to:

if df['value'][date.strftime('%Y-%m-%d').item() > 0.1:

I receive the error: 'ValueError: can only convert an array of size 1 to a Python scalar'

piRSquared · Answer 1 · 2017-04-14T06:57:15.020

`numpy.searchsorted`

s = df['values'].astype(float)
al = np.array([-.1, -.05])
bl = np.array([1, np.nan, 0])
ar = np.array([.05, .1])
br = np.array([0, np.nan, -1])
l = bl[al.searchsorted(s.values)]
r = br[ar.searchsorted(s.values, side='right')]

df.assign(new_values=pd.Series(l + r, s.index).ffill())

           values  new_values
2017-01-01   0.11        -1.0
2017-01-02   0.07        -1.0
2017-01-03   0.04         0.0
2017-01-04  -0.11         1.0
2017-01-05  -0.04         0.0
2017-01-06   0.08         0.0
2017-01-07    0.1        -1.0

how it works

need to get just a series of floats, name it s
set up left side breakpoints al
set up left side mapped values bl
set up right side breakpoints ar
set up right side mapped values br
searchsorted will find the index that values should be placed before
use the indices from searchsorted to identify mapped values
when looking for the right side values, I use side='right'
map values again.
add left and right results. nan + value will be nan
ffill to propagate values forward.

setup
assuming dataframe df given by OP

df = pd.DataFrame(
    ['0.11', '0.07', '0.04', '-0.11',
     '-0.04', '0.08', '0.1'],
    ['2017-01-01', '2017-01-02',
     '2017-01-03', '2017-01-04',
     '2017-01-05', '2017-01-06',
     '2017-01-07'],
    ['values']
)

           values
2017-01-01   0.11
2017-01-02   0.07
2017-01-03   0.04
2017-01-04  -0.11
2017-01-05  -0.04
2017-01-06   0.08
2017-01-07    0.1

This is a great answer, thank you. In terms of building my understanding, is anyone able to explain why my original approach would not work? — Panda_User, Apr 14 '17 at 21:28
@Panda_User see [***this answer***](http://stackoverflow.com/a/43222675/2336654) Feel free to up vote it if its helpful :-) — piRSquared, Apr 14 '17 at 21:30
@Panda_User I see that you got stuck at the initial stages... Yeah, what I gave you is a vectorized solution. Your were going down the road of using loops. Now you don't have to. A number of people fall into the trap of evaluating series as booleans when executing a loop. — piRSquared, Apr 14 '17 at 21:37

logic element-wise operations in pandas time-series dataframe

1 Answers1

numpy.searchsorted

`numpy.searchsorted`