1

I have a pandas DataFrame with a timestamp datetime index and values that correspond to each date. E.g., df = pd.DataFrame(['0.11', '0.07', '0.04', '-0.11', '-0.04', '0.08', '0.1'], index=['2017-01-01', '2017-01-02', '2017-01-03', '2017-01-04', '2017-01-05', '2017-01-06', '2017-01-07'], columns=['values']).

I would like to create an additional column (let's call it 'new_value') based on the present and historical values of the above dataframe.

The logic should be:

  1. if a value is greater than or equal to 0.1, 'new_value' should be set to -1,
  2. once 'new_value' is set to -1, it should remain -1 until a value less than or equal to 0.05 is registered,
  3. if a value is less than or equal to -0.1, 'new_value' should be set to +1,
  4. once 'new_value' is set to +1, it should remain +1 until a value greater than or equal to -0.05 is registered,
  5. otherwise 'new_value' is equal to 0

I have tried multiple solutions, but can't seem to solve this problem. E.g.,

new_frame = pd.DataFrame(np.zeros(len(df.index),index=df.index,columns=['new_value'])
for date in df.index:
    if df['value'][date.strftime('%Y-%m-%d')] > 0.1:
        new_frame.set_value(date.strftime("%Y-%m-%d"),'new_value',-1)

But I receive the error: 'ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().'

If I then change the third line to:

if df['value'][date.strftime('%Y-%m-%d').item() > 0.1:

I receive the error: 'ValueError: can only convert an array of size 1 to a Python scalar'

Panda_User
  • 169
  • 1
  • 3
  • 12

1 Answers1

2

numpy.searchsorted

s = df['values'].astype(float)
al = np.array([-.1, -.05])
bl = np.array([1, np.nan, 0])
ar = np.array([.05, .1])
br = np.array([0, np.nan, -1])
l = bl[al.searchsorted(s.values)]
r = br[ar.searchsorted(s.values, side='right')]

df.assign(new_values=pd.Series(l + r, s.index).ffill())

           values  new_values
2017-01-01   0.11        -1.0
2017-01-02   0.07        -1.0
2017-01-03   0.04         0.0
2017-01-04  -0.11         1.0
2017-01-05  -0.04         0.0
2017-01-06   0.08         0.0
2017-01-07    0.1        -1.0

how it works

  • need to get just a series of floats, name it s
  • set up left side breakpoints al
  • set up left side mapped values bl
  • set up right side breakpoints ar
  • set up right side mapped values br
  • searchsorted will find the index that values should be placed before
  • use the indices from searchsorted to identify mapped values
  • when looking for the right side values, I use side='right'
  • map values again.
  • add left and right results. nan + value will be nan
  • ffill to propagate values forward.

setup
assuming dataframe df given by OP

df = pd.DataFrame(
    ['0.11', '0.07', '0.04', '-0.11',
     '-0.04', '0.08', '0.1'],
    ['2017-01-01', '2017-01-02',
     '2017-01-03', '2017-01-04',
     '2017-01-05', '2017-01-06',
     '2017-01-07'],
    ['values']
)

           values
2017-01-01   0.11
2017-01-02   0.07
2017-01-03   0.04
2017-01-04  -0.11
2017-01-05  -0.04
2017-01-06   0.08
2017-01-07    0.1
piRSquared
  • 285,575
  • 57
  • 475
  • 624
  • This is a great answer, thank you. In terms of building my understanding, is anyone able to explain why my original approach would not work? – Panda_User Apr 14 '17 at 21:28
  • 1
    @Panda_User see [***this answer***](http://stackoverflow.com/a/43222675/2336654) Feel free to up vote it if its helpful :-) – piRSquared Apr 14 '17 at 21:30
  • @Panda_User I see that you got stuck at the initial stages... Yeah, what I gave you is a vectorized solution. Your were going down the road of using loops. Now you don't have to. A number of people fall into the trap of evaluating series as booleans when executing a loop. – piRSquared Apr 14 '17 at 21:37