1

I have the following array of type <class 'numpy.ndarray'>

array([20181010, 20181031, 20181116, 20181012, 20181005, 20181008,
       20181130, 20181011, 20181005, 20181116])

How can I convert its constituents from the current type <class 'numpy.int64'> to datetime in numpy? I want to find a quick way and my understanding is that using a loop or list comprehension, as well as converting this numpy.array to pandas or to a list will be slower.

Please correct me if I am wrong.

P.S. This question may have been answered somewhere, but I could not find a single solution which works.

Newskooler
  • 3,973
  • 7
  • 46
  • 84
  • 2
    @Nixon, The answers there, though.. aren't ideal. Surely there's a better target? – jpp Dec 10 '18 at 17:44
  • 1
    I think `@Sebastian's answer is superior to the duplicate, https://stackoverflow.com/questions/27103044/converting-datetime-string-to-datetime-in-numpy-python, even though both use `pd.to_datetime`. – hpaulj Dec 10 '18 at 20:31
  • @hpaulj I tried the duplicate and I tried Sebastian's answer, and indeed I share your view as well. Hence the reason why I upvoted and marked as correct (and why I could not get a full answer from what was marked as duplicate when asking the question). – Newskooler Dec 10 '18 at 20:33

1 Answers1

3

pandas has a better concept of what can be considered a date:

import numpy as np
import pandas as pd
arr = np.array([20181010, 20181031, 20181116, 20181012, 20181005, 
                20181008, 20181130, 20181011, 20181005, 20181116])
pd.to_datetime(arr.astype(str)).values

Running this over a set of 10,000,000 entries:

%%prun import numpy as np; import pandas as pd
lst = [20181010, 20181031, 20181116, 20181012, 20181005, 
       20181008, 20181130, 20181011, 20181005, 20181116]*1000000
arr = np.array(lst)
arr_str = arr.astype(str)
pd.to_datetime(arr_str).values

produces a prun of

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    8.977    8.977    8.977    8.977 {method 'astype' of 'numpy.ndarray' objects}
        1    4.394    4.394    4.394    4.394 {built-in method pandas._libs.tslib.array_to_datetime}
        2    2.344    1.172    2.344    1.172 {built-in method pandas._libs.algos.ensure_object}
        4    0.918    0.229    0.918    0.229 {built-in method numpy.core.multiarray.array}
        1    0.313    0.313    7.053    7.053 datetimes.py:106(to_datetime)
...

It's efficient enough.

Sebastian Mendez
  • 2,859
  • 14
  • 25
  • In terms of time, how does the conversion to pandas play out? Also, I want to have `np.array` as a final result. Will this be costly? – Newskooler Dec 10 '18 at 17:44
  • No, pandas is also vectorized like numpy; see my edit. – Sebastian Mendez Dec 10 '18 at 17:46
  • `pandas` does a lot of stuff with Python iterations. `vectorized` is a slippery term. To get speed the underlying code needs to be compiled, without a lot of repetative calls to Python classes and objects. – hpaulj Dec 10 '18 at 17:51
  • In the `prun` I don't see any function which is called 10,000,000 times as one would expect by creating 10,000,000 datetime objects. This leads me to believe that pandas does through some level of compilation. – Sebastian Mendez Dec 10 '18 at 17:54
  • This actually produces `np.datetime[ns]` dtype array. That may be what you want. But if you actually want `datetime.date` objects you'll need an added step: `.astype('datetime64[D]').tolist()` – hpaulj Dec 10 '18 at 20:03