0

i got a list of dates, like below:

date_list = ['1. Okt 2021', '2. Okt 2021', '3. Okt 2021', '4. Okt 2021', '5. Okt 2021', '6. Okt 2021', '24. Sep 2021', '25. Sep 2021', '26. Sep 2021']

i want to transform into datetime

dates = [datetime.strptime(x,"%d %b %Y") for x in date_list]

Output is:

Traceback (most recent call last):
  File "c:/Users/Benutzt/Desktop/web_scraping/main.py", line 27, in <module>
    dates = [datetime.strptime(x,"%d %M %Y") for x in date_list]
  File "c:/Users/Benutzt/Desktop/web_scraping/main.py", line 27, in <listcomp>
    dates = [datetime.strptime(x,"%d %M %Y") for x in date_list]
  File "C:\Users\Benutzt\anaconda3\lib\_strptime.py", line 568, in _strptime_datetime
    tt, fraction, gmtoff_fraction = _strptime(data_string, format)
  File "C:\Users\Benutzt\anaconda3\lib\_strptime.py", line 349, in _strptime
    raise ValueError("time data %r does not match format %r" %
ValueError: time data '1. Okt 2021' does not match format '%d %b %Y'
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
mika
  • 173
  • 2
  • 16

3 Answers3

2

For language specific month (or day) names, you can set the locale, e.g. German

import locale
locale.setlocale(locale.LC_TIME, 'de_de') # locale (2nd parameter) is platform-specific !

For a list of valid date inputs, this gives for example

from datetime import datetime
date_list = ['1. Okt 2021', '2. Okt 2021', '3. Okt 2021', '4. Okt 2021', '5. Okt 2021', '6. Okt 2021', '30. Sep 2021']
dates = [datetime.strptime(x, "%d. %b %Y") for x in date_list]

print(dates)
[datetime.datetime(2021, 10, 1, 0, 0), datetime.datetime(2021, 10, 2, 0, 0), datetime.datetime(2021, 10, 3, 0, 0), datetime.datetime(2021, 10, 4, 0, 0), datetime.datetime(2021, 10, 5, 0, 0), datetime.datetime(2021, 10, 6, 0, 0), datetime.datetime(2021, 9, 30, 0, 0)]

Side-note: The locale setting also makes it work in pandas:

import pandas as pd
df = pd.DataFrame({'dates': date_list})
df['dates'] = pd.to_datetime(df['dates'], format="%d. %b %Y")

df['dates']
0   2021-10-01
1   2021-10-02
2   2021-10-03
3   2021-10-04
4   2021-10-05
5   2021-10-06
6   2021-09-30
Name: dates, dtype: datetime64[ns]
FObersteiner
  • 22,500
  • 8
  • 42
  • 72
  • related: [How can I list all available windows locales](https://stackoverflow.com/q/19709026/10197418), [How do I find all available locales in Python](https://stackoverflow.com/q/53320311/10197418) – FObersteiner Oct 01 '21 at 11:44
  • TypeError: strptime() argument 1 must be str, not float – mika Oct 01 '21 at 11:48
  • @mika cannot reproduce. the example works fine for me... please check the state of your variable "date_list" or run my example in a clean namespace to get the idea how this works. – FObersteiner Oct 01 '21 at 11:49
  • yea i see, thank you! can i upload the .csv file ? where I got the data – mika Oct 01 '21 at 11:53
  • @mika yes I can have a look if you upload it – FObersteiner Oct 01 '21 at 11:58
  • okay i dont understand why but i convert into a DataFrame and used `data["Date Time"] = pd.to_datetime(data["Date Time"], format="%d. %b %Y") `. – mika Oct 01 '21 at 11:59
  • 1
    solution is to import locale big thanks!!! – mika Oct 01 '21 at 12:00
1

You can use dateparser package:

# Python env: pip install dateparser
# Anaconda env: conda install dateparser
from dateparser import parse

df = pd.DataFrame({'Date': ['1. Okt 2021', '2. Okt 2021', '3. Okt 2021',
                            '4. Okt 2021', '5. Okt 2021', '6. Okt 2021',
                            '24. Sep 2021', '25. Sep 2021', '26. Sep 2021']})

df['Date'] = df['Date'].apply(parse, languages=['de'])
print(df)

# Output:
0   2021-10-01
1   2021-10-02
2   2021-10-03
3   2021-10-04
4   2021-10-05
5   2021-10-06
6   2021-09-24
7   2021-09-25
8   2021-09-26
Name: Date, dtype: datetime64[ns]

For a list:

date_list = ['1. Okt 2021', '2. Okt 2021', '3. Okt 2021',
             '4. Okt 2021', '5. Okt 2021', '6. Okt 2021',
             '24. Sep 2021', '25. Sep 2021', '26. Sep 2021']

dates = [parse(d, languages=['de']) for d in date_list]
print(dates)

# Output:
[datetime.datetime(2021, 10, 1, 0, 0),
 datetime.datetime(2021, 10, 2, 0, 0),
 datetime.datetime(2021, 10, 3, 0, 0),
 datetime.datetime(2021, 10, 4, 0, 0),
 datetime.datetime(2021, 10, 5, 0, 0),
 datetime.datetime(2021, 10, 6, 0, 0),
 datetime.datetime(2021, 9, 24, 0, 0),
 datetime.datetime(2021, 9, 25, 0, 0),
 datetime.datetime(2021, 9, 26, 0, 0)]
Corralien
  • 109,409
  • 8
  • 28
  • 52
  • @mika. Even if you already chose the right answer for you, can you check my solution. `dateparser` is a very useful module. – Corralien Oct 01 '21 at 12:02
  • convenient... gave it to %timeit and this runs about 25x slower than pd.to_datetime with format specified, so traded for efficiency I guess. – FObersteiner Oct 01 '21 at 12:11
0

It looks like the first part of your date is an ID, e.g. the order of the item in a list. If so, you'll need to remove it before converting the dates. Also, Okt will not match the %b format. You'll need to convert it to Oct.

dates = [datetime.strptime(x.split(".")[-1].strip(), "%b %Y") for x in date_list]

ogdenkev
  • 2,264
  • 1
  • 10
  • 19
  • that just ignores the day, no? – FObersteiner Oct 01 '21 at 11:45
  • 1
    @MrFuppes, if you look at the edits to the question, you will see that the original example included the following two date strings: `31. Sep 2021` and `32. Sep 2021`. As these are not valid days, I assumed that the first part of the strings were not dates, but were in fact from a list of some type. Perhaps they copied from another document, which could have entries like `88. Dec 2021` and `245. Oct 2022`. – ogdenkev Oct 01 '21 at 12:34