1

I have a Finnish representation of a date (tiistaina, 27. lokakuuta 2015) that I need to convert to a datetime object. However, the day and month names are not recognised by the datetime library in Python

I would expect something like the following to work:

import locale
from datetime import datetime

locale.setlocale(locale.LC_TIME, 'fi_FI')
the_date = datetime.strptime('tiistaina, 27. lokakuuta 2015', '%A, %d. %B %Y')

However, this results in:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/_strptime.py", line 500, in _strptime_datetime
tt, fraction = _strptime(data_string, format)
  File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/_strptime.py", line 337, in _strptime
(data_string, format))
ValueError: time data 'tiistaina, 27. lokakuuta 2015' does not match format '%A, %d. %B %Y'

I think this is because Python is expecting the day to be tiistai instead of tiistaina and the month to be lokakuu instead of lokakuuna

http://people.uta.fi/~km56049/finnish/timexp.html seems to suggest that there are, depending on the context, different ways to represent a day or month in the Finnish language.

How can I the string tiistaina, 27. lokakuuta 2015 to a datetime object?

jfs
  • 399,953
  • 195
  • 994
  • 1,670
Iain
  • 1,724
  • 6
  • 23
  • 39

2 Answers2

2

'%A, %d. %B %Y' produces a different time string on my system too:

#!/usr/bin/env python
import locale
from datetime import datetime

#NOTE: locale name is platform-dependent
locale.setlocale(locale.LC_TIME, 'fi_FI.UTF-8') 
print(datetime(2015, 10, 27).strftime('%A, %d. %B %Y'))
# -> tiistai, 27. lokakuu 2015

You could use PyICU to parse a localized date/time string in a given format:

#!/usr/bin/env python
# -*- coding: utf-8 -*-
from datetime import datetime
import icu # PyICU

tz = icu.ICUtzinfo.getDefault() # any ICU timezone will do here
df = icu.SimpleDateFormat('EEEE, dd. MMMM yyyy', icu.Locale('fi_FI'))
df.setTimeZone(tz.timezone)

ts = df.parse(u'tiistaina, 27. lokakuuta 2015')
print(datetime.fromtimestamp(ts, tz).date())
# -> 2015-10-27

Related: Python parsing date and find the correct locale_setting

It works but PyICU is a big dependency and you have to read C++ docs for most things.


There is dateparser module that should work if you add Finnish data to a simple yaml config -- similar to how it is done for other languages. Here's a working example for Dutch language:

#!/usr/bin/env python
import dateparser # $ pip install dateparser

print(dateparser.parse(u'dinsdag, 27. oktober 2015',
                       date_formats=['%A, %d. %B %Y'],
                       languages=['nl']).date())
# -> 2015-10-27

Related: Parse French date in python

Community
  • 1
  • 1
jfs
  • 399,953
  • 195
  • 994
  • 1,670
2

The days of week and month names are substituted in nominative case for %A and %B respectively; however that date format has DOW in the essive case, and the month in partitive. Declension in Finnish is quite complicated in general case, but for this case, you can suffix a DOW name with na to get the required essive, and ta to the month to get the partitive.

Thus the strptime format '%Ana, %d. %Bta %Y' with the fi_FI locale is guaranteed to work for all your dates:

>>> datetime.datetime.strptime('tiistaina, 27. lokakuuta 2015', '%Ana, %d. %Bta %Y')
datetime.datetime(2015, 10, 27, 0, 0)