1

I want to build a custom function that is supported by broadcasting.

In particular, I have two arrays, one of dates and another of times, and I want to merge the two, as in datetime.datetime.combine.

I would like to have something like this (that's the values I have, but the problem is more general):

x = array([datetime.date(2019, 1, 21), datetime.date(2019, 1, 21),
           datetime.date(2019, 1, 21)])
y = array([datetime.time(0, 0), datetime.time(0, 15), datetime.time(0, 30)]

And I would like to do something like this:

datetime.combine(out[:,0], out[:,1])

To get the same result of:

np.asarray([datetime.combine(i,j) for i,j in zip(x,y)])

More generally:

Suppose I have a function f(a,b), and I have two numpy arrays x,y. Is there a way to apply broadcasting rules and obtain f(x,y)?

Giacomo Sachs
  • 229
  • 1
  • 2
  • 9
  • Are you trying to vectorize a python function across a numpy array? – Mad Physicist Apr 12 '19 at 08:18
  • Also, I don't think broadcast means what you think it means... – Mad Physicist Apr 12 '19 at 08:18
  • To answer your last question: it depends on `f`. – Mad Physicist Apr 12 '19 at 08:19
  • @MadPhysicist as far as the first answer is concerned, yes, and I would like to avoid np.vectorize. The more general part is a generalization of the first: I miswrote it. In my example there's a vectorialization problem and, if I got the broadcasting meaning right, in the latter part I assume that the two arrays might be of different sizes (but eligible for broadcasting). – Giacomo Sachs Apr 12 '19 at 08:32
  • no.vectorize is the straight forward solution, or no.frompyfunc if an object dtype is ok. – hpaulj Apr 12 '19 at 11:07
  • 1
    Unless you work with numpy types, you won't get much mileage from numpy arrays that contain references to full blown python objects – Mad Physicist Apr 12 '19 at 12:03

2 Answers2

0

If you are looking for something more than numpy.vectorize, you may want to checkout numpy ufuncs:

https://docs.scipy.org/doc/numpy-1.16.1/reference/ufuncs.html

and you may try to create your own custom ufunc https://docs.scipy.org/doc/numpy/user/c-info.ufunc-tutorial.html

0

A custom ufuncs is fine if you want to dig into c code. But your illustrative case works with datetime objects. np.frompyfunc can be quite useful for that. With object dtype arrays, numpy has to iterate at a (near) Python level, running Python code on each of the objects. If you call a ufunc on an object array, it delegates the task to a corresponding method of each object (and fails it such a method does not exist).

Lets construct your date arrays:

In [20]: from datetime import datetime   

In [35]: alist = [datetime(2019,1,21,0,0), datetime(2019,1,21,0,10),datetime(2020,1,21,0,0)]                                                           
In [36]: x = np.array([a.date() for a in alist])                                
In [37]: y = np.array([a.time() for a in alist])                                
In [38]: x                                                                      
Out[38]: 
array([datetime.date(2019, 1, 21), datetime.date(2019, 1, 21),
       datetime.date(2020, 1, 21)], dtype=object)
In [39]: y                                                                      
Out[39]: 
array([datetime.time(0, 0), datetime.time(0, 10), datetime.time(0, 0)],
      dtype=object)

And do the combine with a list comprehension:

In [41]: np.array([datetime.combine(i,j) for i, j in zip(x,y)])                 
Out[41]: 
array([datetime.datetime(2019, 1, 21, 0, 0),
       datetime.datetime(2019, 1, 21, 0, 10),
       datetime.datetime(2020, 1, 21, 0, 0)], dtype=object)

and with frompyfunc:

In [43]: np.frompyfunc(datetime.combine, 2,1)(x,y)                              
Out[43]: 
array([datetime.datetime(2019, 1, 21, 0, 0),
       datetime.datetime(2019, 1, 21, 0, 10),
       datetime.datetime(2020, 1, 21, 0, 0)], dtype=object)

With frompyfunc we can apply broadcasting

In [44]: np.frompyfunc(datetime.combine, 2,1)(x,y[:,None])                      
Out[44]: 
array([[datetime.datetime(2019, 1, 21, 0, 0),
        datetime.datetime(2019, 1, 21, 0, 0),
        datetime.datetime(2020, 1, 21, 0, 0)],
       [datetime.datetime(2019, 1, 21, 0, 10),
        datetime.datetime(2019, 1, 21, 0, 10),
        datetime.datetime(2020, 1, 21, 0, 10)],
       [datetime.datetime(2019, 1, 21, 0, 0),
        datetime.datetime(2019, 1, 21, 0, 0),
        datetime.datetime(2020, 1, 21, 0, 0)]], dtype=object)

x could have been constructed with frompyfunc:

In [46]: np.frompyfunc(lambda a: a.date(),1,1)(alist)                           
Out[46]: 
array([datetime.date(2019, 1, 21), datetime.date(2019, 1, 21),
       datetime.date(2020, 1, 21)], dtype=object)

The frompyfunc version of combine is a bit faster

In [47]: timeit np.frompyfunc(datetime.combine, 2,1)(x,y)                       
5.39 µs ± 181 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
In [48]: timeit np.array([datetime.combine(i,j) for i, j in zip(x,y)])          
11.8 µs ± 66.2 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

though a good chunk of the [48] time comes from the array interface:

In [51]: timeit [datetime.combine(i,j) for i, j in zip(x,y)]                    
3.91 µs ± 41.2 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

combine from list versions of x and y is even faster.

In [52]: %%timeit xy=zip(x.tolist(),y.tolist()) 
    ...: [datetime.combine(i,j) for i,j in xy] 
190 ns ± 0.579 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)
hpaulj
  • 221,503
  • 14
  • 230
  • 353