2

I have dataframe with measurments for differnt dates and hours:

>>>date       hour   value
0 29-11-2020   8     0.231
1 29-11-2020   9     0.245
2 29-11-2020   10    0.475
3 30-11-2020   8     0.214
4 30-11-2020   9     0.239
5 30-11-2020   10    0.312
...

The table described measured values for every day every hour. the measurments are taken from the third day of an experiment to day 15. I would like to create new column that will have the ordinal day of the experiment base don the dates, e.g:

>>>date       hour   value   count
0 29-11-2020   8     0.231    3
1 29-11-2020   9     0.245    3
2 29-11-2020   10    0.475    3
3 30-11-2020   8     0.214    4
4 30-11-2020   9     0.239    4
5 30-11-2020   10    0.312    4
5 01-12-2020   8     0.312    5
5 01-12-2020   9     0.312    5
5 01-12-2020   10    0.521    5
...

(29-11 is the third day, 30-11 is the 2nd day...)

I have seen this post but it doesn't solve the same problem (they give repetative "codes" while I want to "count" unique dates from the 3rd day and give each date the ordinal number. I have also tried using "tooordinal" but it jsut changed the date values without count it.

Reut
  • 1,555
  • 4
  • 23
  • 55

1 Answers1

3

Cast the 'date' column to datetime, subtract the first day, take the day of the resulting timedelta Series and add the offset days:

experiment_day_start = 3

df['date'] = pd.to_datetime(df['date'], dayfirst=True)
df['exp_day'] = (df['date']-df['date'].min()).dt.days + experiment_day_start

df
        date  hour  value  count  exp_day
0 2020-11-29     8  0.231      3        3
1 2020-11-29     9  0.245      3        3
2 2020-11-29    10  0.475      3        3
3 2020-11-30     8  0.214      4        4
4 2020-11-30     9  0.239      4        4
5 2020-11-30    10  0.312      4        4
6 2020-12-01     8  0.312      5        5
7 2020-12-01     9  0.312      5        5
8 2020-12-01    10  0.521      5        5
FObersteiner
  • 22,500
  • 8
  • 42
  • 72
  • thankyou, that worked! can you just explain what the dayfirst=True paramter does? read about it but i'm not sure how it contributes – Reut Feb 08 '21 at 08:54
  • 2
    @Reut: your dates obviously start with the day, but pandas to_datetime by default assumes that the month comes first - unless you set this keyword to True. – FObersteiner Feb 08 '21 at 08:57
  • 2
    Alternatively you can look into the [relativedelta](https://dateutil.readthedocs.io/en/stable/relativedelta.html) library that will do the delta automatically for you given the startdate, enddate and the unit (day, month, year etc) – AutomatedChaos Feb 08 '21 at 09:03
  • 1
    There also is another option btw., which somebody posted and then deleted: using Series.rank, e.g. as `(df['date'].rank(method='dense')-1)+experiment_day_start`. I'd prefer the timedelta way though, since I think this is just more readable. – FObersteiner Feb 08 '21 at 09:04
  • 2
    To add to MrFuppes' comment, pandas' to_datetime assumes that month comes first by default and tries this _per record_, without warnings. so `['30-11-2020', '01-12-2020']` will be interpreted as [30 nov, 12 jan]. – Swier Feb 08 '21 at 09:04
  • 1
    @Swier: this is a very good point, since this can lead to pretty strange results... – FObersteiner Feb 08 '21 at 09:07