2

Let's assume a dataframe using datetimes as index, where we have a column named 'Score', initialy set to 10:

            score
2016-01-01  10
2016-01-02  10
2016-01-03  10
2016-01-04  10
2016-01-05  10
2016-01-06  10
2016-01-07  10
2016-01-08  10

I want to substract a fixed value (let's say 1) from the score, but only when the index is between certain dates (for example between the 3rd and the 6th):

            score
2016-01-01  10
2016-01-02  10
2016-01-03  9
2016-01-04  9
2016-01-05  9
2016-01-06  9
2016-01-07  10
2016-01-08  10

Since my real dataframe is big, and I will be doing this for different dateranges and different fixed values N for each one of them, I'd like to achieve this without requiring to create a new column set to -N for each case.

Something like numpy's where function, but for a certain range, and allowing me to sum/substract to current value if the condition is met, and do nothing otherwise. Is there something like that?

Roman Rdgz
  • 12,836
  • 41
  • 131
  • 207
  • check this answer https://stackoverflow.com/questions/29370057/select-dataframe-rows-between-two-dates and also check https://pandas.pydata.org/pandas-docs/version/0.20.3/generated/pandas.DataFrame.between_time.html – Yuca Aug 21 '19 at 12:19
  • @Yuca the problem is not getting the range, but editing the dataframe only within that range – Roman Rdgz Aug 21 '19 at 12:28
  • there's no distinction in what you're saying. Editing the dataframe requires getting the range, so what's the problem then? – Yuca Aug 21 '19 at 12:34

3 Answers3

5

Use index slicing:

df.loc['2016-01-03':'2016-01-06', 'score'] -= 1
Code Different
  • 90,614
  • 16
  • 144
  • 163
2

Assuming dates are datetime dtype:

#if date is own column:    
df.loc[df['date'].dt.day.between(3,6), 'score'] = df['score'] - 1

#if date is index:    
df.loc[df.index.day.isin(range(3,7)), 'score'] = df['score'] - 1
manwithfewneeds
  • 1,137
  • 1
  • 7
  • 10
2

I would do something like that using query :

import pandas as pd

df = pd.DataFrame({"score":pd.np.random.randint(1,10,100)}, 
    index=pd.date_range(start="2018-01-01", periods=100))

start = "2018-01-05"
stop = "2018-04-08"

df.query('@start <= index <= @stop ') - 1  

Edit : note that something using eval which goes to boolean, can be used but in a different manner because pandas where acts on the False values.

df.where(~df.eval('@start <= index <=  @stop '), 
         df['score'] - 1, axis=0, inplace=True)

See how I inverted the comparison operators (with ~), in order to get what I wanted. It's efficient but not really clear. Of course, you can also use pd.np.where and all is good in the world.

Nathan Furnal
  • 2,236
  • 3
  • 12
  • 25