0

I need a function to count the total number of days in the 'days' column between a start date of 1st Jan 1995 and an end date of 31st Dec 2019 in a dataframe taking Leapyears into account as well

For example:

  • 1st Jan 1995 - Day 1
  • 1st Feb 1995 - Day 32
  • 2nd Feb 1995 - Day 33...

And so on all the way to 31st Dec 2019.

Screenshot of the 1st ten rows of the dataset

This is the function I created initially but it doesn't work. prices is the name of the data frame and 'days' is the column where the number of days is to reflect.

def date_difference(self):

    for i in range(prices.shape[0] - 1):
        prices['days'][i+1] = (prices['days'][i+1] - prices['days'][i])
wjandrea
  • 28,235
  • 9
  • 60
  • 81
  • Simplest way would be to parse the dates into datatimes using `pd.to_datetime`, then subtract the the first element from the column. That will leave you with a timedelta, which you can convert to seconds, then divide by the number of seconds in a day. – Nick ODell Nov 07 '22 at 18:55
  • I can make a more specific answer if you make a [reproducible example.](https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples) (I don't want to re-type your data.) – Nick ODell Nov 07 '22 at 18:55
  • Welcome to Stack Overflow! [Please don't post pictures of text](https://meta.stackoverflow.com/q/285551/4518341) and please show us the output you want. See [How to make good reproducible pandas examples](/q/20109391/4518341). You can [edit] it into your post, and use the formatting tools like [code formatting](/editing-help#code). BTW, for more tips, see [mre] and [ask]. – wjandrea Nov 07 '22 at 18:57
  • @Nick No need to use seconds; timedeltas have a `days` attribute. – wjandrea Nov 07 '22 at 19:07

2 Answers2

1

Convert types

First of all, make sure that the days column is the proper type. Use df.days.dtype and it should be datetime64. If you get object type that means you have a string containing a date and you need to convert the type using

df.days = pd.to_datetime(df.days)

Calculate difference

df['days_diff'] = (df.days - pd.Timestamp('1995-01-01')).dt.days

Also, I would recommend changing the name of the column to date before it contains dates. Later you can assign the days to a column called so. It's just for clarity of your code and future maintaining it.

wjandrea
  • 28,235
  • 9
  • 60
  • 81
Mateusz Dorobek
  • 702
  • 1
  • 6
  • 22
-1

I finally got it to work by doing this:

def date_difference(last_day):
    last_day = pd.to_datetime(last_day, dayfirst = True)
    first_day = pd.to_datetime("01/01/1995", dayfirst = True)
    diff = last_day - first_day

prices['days'] = prices['days'].apply(date_difference)
  • `diff` is never used. It looks like you meant to return it. But why use this instead of Mateusz's answer, which is more idiomatic for Pandas? – wjandrea Nov 07 '22 at 21:58
  • Beside the point, but why are you using `dayfirst=True` when your data seems to be all in ISO 8601 format? – wjandrea Nov 07 '22 at 21:59
  • As it’s currently written, your answer is unclear. Please [edit] to add additional details that will help others understand how this addresses the question asked. You can find more information on how to write good answers [in the help center](/help/how-to-answer). – Community Nov 10 '22 at 12:00