1

I am quite new to python and trying some features. I searched a lot in here for a possible solution with no success. This is my issue:

Goal:

  • read a column with date and time from an .xlsx file --> works

  • get this column in a numpy array --> works

  • The array item is read from the .xlsx as an Object

The array output looks like this:

['2020-11-09 20:30:59' '2020-11-08 09:22:54' '2020-11-04 02:24:17' ...
 '1900-01-27 02:30:00' '1900-01-24 03:00:00' '1900-01-18 15:30:00']
  • convert the date and time to a float --> does not work

I tried the following:

x[:, 0] = [np.datetime64(x[:, 0])]

--> this doesn't work; so I tried to convert to String before.

x[:, 0] = [np.datetime64(str(x[:, 0]))] 

--> This does also not work with this error message:

ValueError: Error parsing datetime string "['2020-11-09 20:30:59' '2020-11-08 09:22:54' '2020-11-04 02:24:17' ...
 '1900-01-27 02:30:00' '1900-01-24 03:00:00' '1900-01-18 15:30:00']" at position 0

So where is the issue?

Thanks a lot for the answer! It seems to work with the datetime64 conversion. I did change .

np.array(alist, 'datetime64[ms]')

to

np.array(alist, 'datetime64[ns]')

By that I get the datetime64 as an integer. Nevertheless, how comes that I get negative values for the early datesas seen in the output?

-----------------------------------------------------
--> Before conversion: ['2020-11-09 20:30:59' '2020-11-08 09:22:54' '2020-11-04 02:24:17' ...
 '1900-01-27 02:30:00' '1900-01-24 03:00:00' '1900-01-18 15:30:00']
-----------------------------------------------------
--> After conversion to dt64: [1604953859000000000 1604827374000000000 1604456657000000000 ...      
 -2206733400000000000 -2206990800000000000 -2207464200000000000]
-----------------------------------------------------
rookieOC
  • 47
  • 5
  • 1
    If you wanna change the dtype to datetime you can use `x.astype('datetime64[ms]’)` – Nk03 Jun 19 '21 at 19:21
  • The first "doesn't work" errors lack details - actual code and error message. The error in the last case should be clear. `str(x[:,0])` is everything between `""`, the brackets, spaces and '' strings. – hpaulj Jun 19 '21 at 20:11
  • I want to use the date and time as float in a machine learning model, so I need to convert to float and later on cpnvert back to date and time. – rookieOC Jun 19 '21 at 22:16
  • 1
    the point is: if you convert to integer, you get [Unix time](https://en.wikipedia.org/wiki/Unix_time) which is *relative* time to 1970-01-01 (the "Unix time epoch"), so values before that date will be negative. – FObersteiner Jun 21 '21 at 07:02

2 Answers2

1

I was going to say that datatime64 can be picky when converting strings. It can't handle every kind of delimiter. But

In [219]: alist = ['2020-11-09 20:30:59', '2020-11-08 09:22:54', '2020-11-04 02:24:17',
     ...:  '1900-01-27 02:30:00', '1900-01-24 03:00:00', '1900-01-18 15:30:00']
In [220]: alist
Out[220]: 
['2020-11-09 20:30:59',
 '2020-11-08 09:22:54',
 '2020-11-04 02:24:17',
 '1900-01-27 02:30:00',
 '1900-01-24 03:00:00',
 '1900-01-18 15:30:00']

it handles this format just fine:

In [221]: np.array(alist, 'datetime64[ms]')
Out[221]: 
array(['2020-11-09T20:30:59.000', '2020-11-08T09:22:54.000',
       '2020-11-04T02:24:17.000', '1900-01-27T02:30:00.000',
       '1900-01-24T03:00:00.000', '1900-01-18T15:30:00.000'],
      dtype='datetime64[ms]')

This display has a 'T' between date and time, but its lack is not a problem.

converted back to a list, it produces datatime objects:

In [222]: _.tolist()
Out[222]: 
[datetime.datetime(2020, 11, 9, 20, 30, 59),
 datetime.datetime(2020, 11, 8, 9, 22, 54),
 datetime.datetime(2020, 11, 4, 2, 24, 17),
 datetime.datetime(1900, 1, 27, 2, 30),
 datetime.datetime(1900, 1, 24, 3, 0),
 datetime.datetime(1900, 1, 18, 15, 30)]

Or just displaying seconds

In [223]: np.array(alist, 'datetime64[s]')
Out[223]: 
array(['2020-11-09T20:30:59', '2020-11-08T09:22:54',
       '2020-11-04T02:24:17', '1900-01-27T02:30:00',
       '1900-01-24T03:00:00', '1900-01-18T15:30:00'],
      dtype='datetime64[s]')

Parsing a single datatime string

In [235]: alist[0]
Out[235]: '2020-11-09 20:30:59'
In [236]: np.datetime64(alist[0])
Out[236]: numpy.datetime64('2020-11-09T20:30:59')

If it's an object dtype array (such as one might get from a dataframe):

In [254]: arr = np.array(alist, object)
In [255]: arr
Out[255]: 
array(['2020-11-09 20:30:59', '2020-11-08 09:22:54',
       '2020-11-04 02:24:17', '1900-01-27 02:30:00',
       '1900-01-24 03:00:00', '1900-01-18 15:30:00'], dtype=object)
In [256]: arr1 = arr.astype('datetime64[s]')
In [257]: arr1
Out[257]: 
array(['2020-11-09T20:30:59', '2020-11-08T09:22:54',
       '2020-11-04T02:24:17', '1900-01-27T02:30:00',
       '1900-01-24T03:00:00', '1900-01-18T15:30:00'],
      dtype='datetime64[s]')
hpaulj
  • 221,503
  • 14
  • 230
  • 353
  • Thanks a lot! It seems to work. I did change . Nevertheless, how comes that I get negative values for the early dates? – rookieOC Jun 20 '21 at 10:02
1

TLDR:

use astype for conversion to appropriate types:

import numpy as np

# array of dtype object (string)
arr = np.array(['2020-11-09 20:30:59', '2020-11-08 09:22:54', '2020-11-04 02:24:17',
                '1900-01-27 02:30:00', '1900-01-24 03:00:00', '1900-01-18 15:30:00'])

# to numpy's datetime, uses 'datetime64[s]' automatically
dtarr = arr.astype(np.datetime64)
# dtarr.dtype
# dtype('<M8[s]')

# to Unix time, using 1970-01-01 as epoch - values before the epoch are negative
dtarr_unix = dtarr.astype(np.int64) # datetime64[s] will give seconds since the epoch...
# array([ 1604953859,  1604827374,  1604456657, -2206733400, -2206990800,
#         -2207464200], dtype=int64)

# or using 1900-01-01 as epoch:
dtarr_1900 = (dtarr - np.datetime64('1900-01-01')).astype(np.int64)
# array([3813942659, 3813816174, 3813445457,    2255400,    1998000,
#           1524600], dtype=int64)

related: How to get unix timestamp from numpy.datetime64

FObersteiner
  • 22,500
  • 8
  • 42
  • 72