1

I have a table of this format :

                 date
0 2017-03-30 09:59:00
1 2017-03-30 09:59:30
2 2017-03-30 10:00:00
3 2017-03-30 10:00:30
4 2017-03-30 10:01:00
5 2017-03-30 10:01:30
6 2017-03-30 10:02:00

What I'd like to do is to count the days in my table (starting by 1) but rather than starting at midnight, I'd like my days to start at 10:00:00. So for the table above, the result would be :

                 date  day
0 2017-03-30 09:59:00    0
1 2017-03-30 09:59:30    0
2 2017-03-30 10:00:00    1
3 2017-03-30 10:00:30    1
4 2017-03-30 10:01:00    1
5 2017-03-30 10:01:30    1
6 2017-03-30 10:02:00    1

So far, I managed to produce the new column with the following code :

d=1
hour_before = 10
col_days = []
for row in df.itertuples(): 
    if row.date.hour == 10 and hour_before != 10 : 
            d+= 1
    hour_before = row.date.hour 
    cols_days.append(d)
df['day'] = col_days

But I'd love to have a more elegant (and more importantly, quicker) way to do this operation.

Any suggestions ?

cs95
  • 379,657
  • 97
  • 704
  • 746
mlx
  • 504
  • 1
  • 4
  • 15
  • 2
    I'm sure people are getting tired of seeing this link in the comments, but it's like a knee jerk reaction of mine every time I see `iterrows` inside code. [DO NOT USE ITERROWS!](https://stackoverflow.com/a/55557758) – cs95 Jun 26 '19 at 04:57
  • I for one have never seen this link and definitely thank you for the tip, I'll save it. – mlx Jun 26 '19 at 05:24

1 Answers1

3

What if you shift your data back by 10 hours and then group rows by days? You can then use pd.factorize on the dates, this should be pretty fast.

df['day'] = pd.factorize((df['date'] - pd.Timedelta(hours=10)).dt.date)[0]
df
                 date  day
0 2017-03-30 09:59:00    0
1 2017-03-30 09:59:30    0
2 2017-03-30 10:00:00    1
3 2017-03-30 10:00:30    1
4 2017-03-30 10:01:00    1
5 2017-03-30 10:01:30    1
6 2017-03-30 10:02:00    1

GroupBy.ngroup is another alternative:

df['day'] = df.groupby((df['date'] - pd.Timedelta(hours=10)).dt.date).ngroup()
df
                 date  day
0 2017-03-30 09:59:00    0
1 2017-03-30 09:59:30    0
2 2017-03-30 10:00:00    1
3 2017-03-30 10:00:30    1
4 2017-03-30 10:01:00    1
5 2017-03-30 10:01:30    1
6 2017-03-30 10:02:00    1
cs95
  • 379,657
  • 97
  • 704
  • 746
  • @Chris indeed, `ngroup` is for enumerating groups. You can also [enumerate in descending order.](https://stackoverflow.com/a/56720838/4909087) – cs95 Jun 26 '19 at 04:56