2

I have a dataframe in pandas which has an error in the index: each entry between 23:00:00 and 23:59:59 has a wrong date. I would need to subtract one day (i.e. 24 hours) to each entry between those two times.

I know that I can obtain the entries between those two times as df[df.hour == 23], where df is my dataframe. However, can I modify the day only for those specific entries of the dataframe index?

Resetting would take me more time, since my dataframe index is not evenly spaced as you can see from the figure below (the step between two consecutive entries is once 15 minutes and once 30 minutes). Note also from the figure the wrong date in the last three entries: it should be 2018-02-05 and not 2018-02-06.

I tried to do this

df[df.index.hour == 23].index.day = df[df.index.hour == 23].index.day - 1

but I get AttributeError: can't set attribute

Sample data:

2018-02-05 22:00:00    271.8000
2018-02-05 22:30:00    271.5600
2018-02-05 22:45:00    271.4400
2018-02-06 23:15:00    271.3750
2018-02-06 23:30:00    271.3425
2018-02-06 00:00:00    271.2700
2018-02-06 00:15:00    271.2300
2018-02-06 00:45:00    271.1500
2018-02-06 01:00:00    271.1475
2018-02-06 01:30:00    271.1425
2018-02-06 01:45:00    271.1400

Expected output:

2018-02-05 22:00:00    271.8000
2018-02-05 22:30:00    271.5600
2018-02-05 22:45:00    271.4400
2018-02-05 23:15:00    271.3750
2018-02-05 23:30:00    271.3425
2018-02-06 00:00:00    271.2700
2018-02-06 00:15:00    271.2300
2018-02-06 00:45:00    271.1500
2018-02-06 01:00:00    271.1475
2018-02-06 01:30:00    271.1425
2018-02-06 01:45:00    271.1400

enter image description here

cholo14
  • 594
  • 1
  • 3
  • 22

2 Answers2

0

You could try TimeDeltas.

If your dataframe has a datetime index, you should be able to subtract from it directly.

df[df.hour == 23] - pd.Timedelta('1 days')

If df.index type is string, then you should first change types, then subtract: df.index = pd.to_datetime(df.index)

df.index - pd.Timedelta('1 days')

Victor Valente
  • 761
  • 9
  • 24
  • This will not work, since ``df`` doesn't have the attribute ``hour``, and it doesn't even work by doing ``df[df.index.hour == 23] - pd.Timedelta('1 days')`` – cholo14 Feb 27 '19 at 08:54
0

I solved the issue myself by using this answer. This is my code:

as_list = df.index.tolist()
new_index = []
for idx,entry in enumerate(as_list):
    if entry.hour == 23:
        if entry.day != 1:            
            new_index.append(as_list[idx].replace(day = as_list[idx].day - 1))
        else:
            new_day = calendar.monthrange(as_list[idx].year, as_list[idx].month -1)[1]
            new_index.append(as_list[idx].replace(day = new_day, month = entry.month -1))
    else:
        new_index.append(entry)
df.index = new_index
cholo14
  • 594
  • 1
  • 3
  • 22