It would be interested to see if the approach of using map
with datetools.parse
is as scalable as the standard approach given here, here and here.
Let's make a really big Series
of string-represented dates to find out:
In [11]: import datetime as dt
In [12]: format = '%d/%m/%Y %H:%M:%S'
In [13]: def random_date():
....: rand_num = np.random.uniform(2e9)
....: return dt.datetime.fromtimestamp(rand_num).strftime(format)
In [14]: dates = pd.Series([random_date() for i in range(100000)])
In [15]: dates.head() # Some random dates (as strings)
Out[15]:
0 30/11/1988 15:11:08
1 08/05/2025 10:29:02
2 05/09/2017 02:24:46
3 18/03/2016 14:55:20
4 22/04/1984 04:58:06
dtype: object
Now let's time the two approaches:
In [33]: %timeit dates.map(lambda x: pd.datetools.parse(x))
1 loops, best of 3: 6.98 s per loop
In [2]: %timeit pd.to_datetime(dates,format=format)
1 loops, best of 3: 525 ms per loop
So there we have it. The unorthodox approach of @maxymoo is much slower than the accepted approach when the Series
is long!