3

I have a DataFrame df_holiday and I want to select the row, which consists the next nearest date from todays date in order to extract the holiday name.

+----------+---------------------+
|   date   |        name         |
+----------+---------------------+
| 01012019 | Neujahr             |
| 06012019 | Heilige Drei Könige |
| 19042019 | Karfreitag          |
| 22042019 | Ostermontag         |
| 01052019 | Tag der Arbeit      |
| 10062019 | Pfingstmontag       |
+----------+---------------------+

If I do type(df_holiday['date'][0]) it outputs str

Now I want to convert the column into datetime format using this:

import datetime
df_holiday['date'] = df_holiday['date'].apply(lambda x: datetime.datetime.strptime(x, '%d%m%Y'))

First question here: Doc says it will return a datetime but in my case I get a timestamp, why?

type(df_holiday['date'][0])
pandas._libs.tslibs.timestamps.Timestamp

print(df_holiday['date'][0]
Timestamp('2019-01-01 00:00:00')

I found this post from dawg who suggested the following routine:

min([d for d in df_holiday['date'] if d> datetime.date.today()], key=lambda s: 
          datetime.datetime.strptime(s, "%d%m%Y").date()-datetime.date.today())

Since the values in the date column are timestamps I got a

TypeError: Cannot compare type 'Timestamp' with type 'date'

I think this will work if its possible to convert the date column in a proper datetime format and not in timestamp format. How can I achieve this? Or is there any better solution for my case?

adama
  • 537
  • 2
  • 10
  • 29

3 Answers3

2

here you go: this will use todays date to filter by the holidays dates and take the next one and return its name. I have tried to make it as pythonic as possible. Any question, please ask.

import pandas as pd
import numpy as np
from pandas.compat import StringIO
from datetime import datetime
import dateutil.parser
RawData="""
date|name
01012019|Neujahr            
06012019|Heilige Drei Könige 
19042019|Karfreitag          
22042019|Ostermontag         
01052019|Tag der Arbeit      
10062019|Pfingstmontag  
14092019|Internationale Feiertage

"""
holidays = pd.read_csv(StringIO(RawData), sep="|",dtype={'date': object})
holidays['date'] = holidays['date'].astype(str).apply(lambda x: dateutil.parser.parse(x[4:8]+x[2:4]+x[0:2]))
holidays[holidays.date > datetime.now()].head(1)['name']  #compare with todays date and get the next one

Result:

5    Pfingstmontag  
MEdwin
  • 2,940
  • 1
  • 14
  • 27
  • thank you so much, this is exactly what I need. But can you shortly explain the parser function? I do not get the parse(x[4:8]+x[2:4]+x[0:2]) section – adama May 21 '19 at 10:53
  • 1
    Okay, you have 8 string characters for the date for example the first one : 01012019, 2019 is the year; so that is character 4 - 8 (four characters). – MEdwin May 21 '19 at 10:58
  • 1
    x[4:8]#year x[2:4]#month x[0:2]#day – MEdwin May 21 '19 at 10:59
-1

You can convert a timestamp to date like so :

import datetime
readable = datetime.datetime.fromtimestamp(1558272180)
print(readable)
2019-05-19T13:23:00

Here is a link with several other ways to do it.

vlemaistre
  • 3,301
  • 13
  • 30
  • thanks but this doesn´t solve the problem. If i print the first row in my data column it returens Timestamp('2019-01-01 00:00:00'), furthermore your suggestion returns readable as str and not as date – adama May 21 '19 at 08:21
  • My bad you have remove the isoformat() to get a datetime type. I edited my anwser – vlemaistre May 21 '19 at 08:39
-1

Since your date column is string, So converting it into date object

import pandas as pd df['date'] = pd.to_datetime(df['date'],format='%d%m%Y')

This would convert date column to date type.

Next you could try the nearest date algorithm as stated above, this won't convert to timestamp format

  • this also returns a timestamp: df_holiday['date'] = pd.to_datetime(df_holiday['date'], format='%d%m%Y') --- type(df_holiday['date'][0]) --- pandas._libs.tslibs.timestamps.Timestamp – adama May 21 '19 at 08:37