0

For the following example, the elements with dtype np.datetime64 get automatically converted to datetime.datetime when they get added to another numpy array.

How can I avoid this autoconversion?

import numpy as np
a = np.array([['2018-04-01T15:30:00'],
       ['2018-04-01T15:31:00'],
       ['2018-04-01T15:32:00'],
       ['2018-04-01T15:33:00'],
       ['2018-04-01T15:34:00']], dtype='datetime64[s]')
c = np.array([0,1,2,3,4]).reshape(-1,1)
c = c.astype("object")
d = np.append(c,a,axis=1)
d

.

array([[0, datetime.datetime(2018, 4, 1, 15, 30)],
       [1, datetime.datetime(2018, 4, 1, 15, 31)],
       [2, datetime.datetime(2018, 4, 1, 15, 32)],
       [3, datetime.datetime(2018, 4, 1, 15, 33)],
       [4, datetime.datetime(2018, 4, 1, 15, 34)]], dtype=object)
user7468395
  • 1,299
  • 2
  • 10
  • 23
  • What is `c` before the append? `a.tolist()` produces `datetime` objects. `astype(object)` might be doing the same thing. – hpaulj Jan 20 '19 at 18:38
  • Why are you trying to join these arrays? What are you going to do with the result? However you do it, it is not a standard numpy array. – hpaulj Jan 20 '19 at 18:54
  • I just checked. `x.astype(object)` 'unboxes' everything. `np.float64` numbers become `float`. ` – hpaulj Jan 20 '19 at 19:37
  • Regarding "Why are you trying to join these arrays?" - it should be layman's alternative to pandas for these situations where I get to the performance limits of pandas. Now I learned that I have to do it differently with numpy :-) – user7468395 Jan 21 '19 at 02:04

2 Answers2

3

Sometimes we have to make a 'blank' object array, and fill it piece by piece.

In [57]: d = np.empty((5,2), object)
In [58]: d
Out[58]: 
array([[None, None],
       [None, None],
       [None, None],
       [None, None],
       [None, None]], dtype=object)

We can fill it by columns, but the result is as with the concatenate (don't use np.append):

In [59]: d[:,0] = c.ravel()
In [60]: d[:,1] = a.ravel()
In [61]: d
Out[61]: 
array([[0, datetime.datetime(2018, 4, 1, 15, 30)],
       [1, datetime.datetime(2018, 4, 1, 15, 31)],
       [2, datetime.datetime(2018, 4, 1, 15, 32)],
       [3, datetime.datetime(2018, 4, 1, 15, 33)],
       [4, datetime.datetime(2018, 4, 1, 15, 34)]], dtype=object)

As with a.astype(object) it has 'unboxed' the dates.

But if I assign elements one by one:

In [62]: for i in range(5):
    ...:     d[i,1]=a[i,0]
    ...:     
In [63]: d
Out[63]: 
array([[0, numpy.datetime64('2018-04-01T15:30:00')],
       [1, numpy.datetime64('2018-04-01T15:31:00')],
       [2, numpy.datetime64('2018-04-01T15:32:00')],
       [3, numpy.datetime64('2018-04-01T15:33:00')],
       [4, numpy.datetime64('2018-04-01T15:34:00')]], dtype=object)

But what's the value of such an array?

I can add a timedelta to the original time array:

In [67]: a + np.array(10, 'timedelta64[m]')
Out[67]: 
array([['2018-04-01T15:40:00'],
       ['2018-04-01T15:41:00'],
       ['2018-04-01T15:42:00'],
       ['2018-04-01T15:43:00'],
       ['2018-04-01T15:44:00']], dtype='datetime64[s]')

but I can't do the same thing to the object array column:

In [68]: d[:,1] + np.array(10, 'timedelta64[m]')
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-68-f82827d3d355> in <module>()
----> 1 d[:,1] + np.array(10, 'timedelta64[m]')

TypeError: ufunc add cannot use operands with types dtype('O') and dtype('<m8[m]')

I have to explicitly iterate on the objects:

In [70]: for i in range(5):
    ...:     d[i,1] += np.array(i*10, 'timedelta64[m]')
    ...:     
In [71]: d
Out[71]: 
array([[0, numpy.datetime64('2018-04-01T15:30:00')],
       [1, numpy.datetime64('2018-04-01T15:41:00')],
       [2, numpy.datetime64('2018-04-01T15:52:00')],
       [3, numpy.datetime64('2018-04-01T16:03:00')],
       [4, numpy.datetime64('2018-04-01T16:14:00')]], dtype=object)
hpaulj
  • 221,503
  • 14
  • 230
  • 353
1

Use a record array instead of dtype=object

Fix this by constructing an array that can properly handle columns with different types. The dead simplest way to do this is to make a record array, like so:

rarr = np.rec.fromarrays([a, c], names=('date', 'val'))

print(rarr)
# output
#     rec.array([[('2018-04-01T15:30:00', 0)],
#                [('2018-04-01T15:31:00', 1)],
#                [('2018-04-01T15:32:00', 2)],
#                [('2018-04-01T15:33:00', 3)],
#                [('2018-04-01T15:34:00', 4)]],
#               dtype=[('date', '<M8[s]'), ('val', '<i8')])

print(rarr.date)
# output
#     array([['2018-04-01T15:30:00'],
#            ['2018-04-01T15:31:00'],
#            ['2018-04-01T15:32:00'],
#            ['2018-04-01T15:33:00'],
#            ['2018-04-01T15:34:00']], dtype='datetime64[s]')

As hpaulj points out, no matter what you do you can't add to (or otherwise easily manipulate) a datetime64 column in an array of dtype=object. However, this is easy to do with a record array:

print(rarr.date + np.array(10, 'timedelta64[m]'))
# output
#     array([['2018-04-01T15:40:00'],
#            ['2018-04-01T15:41:00'],
#            ['2018-04-01T15:42:00'],
#            ['2018-04-01T15:43:00'],
#            ['2018-04-01T15:44:00']], dtype='datetime64[s]')
tel
  • 13,005
  • 2
  • 44
  • 62