How to avoid additional float-object-copies while initializing a numpy object-array

Question

I naively assumed that assigning a value via ellipsis [...], e.g.

a = np.empty(N, dtype=np.object)
a[...] = 0.0

is basically a faster version of the following naive loop:

def slow_assign_1d(a, value):
    for i in range(len(a)):
        a[i] = value

However this seems not to be the case. Here is an example for different behavior:

>>> a=np.empty(2, dtype=np.object)
>>> a[...] = 0.0
>>> a[0] is a[1]
False

the object 0.0 seems to be cloned. Yet when I use the naive slow version:

>>> a=np.empty(2, dtype=np.object)
>>> slow_assign(a, 0.0)
>>> a[0] is a[1]
True

all elements are the "same".

Funnily enough, the desired behavior with ellipsis can be observed for example with a custom class:

>>> class A:
       pass
>>> a[...]=A()
>>> a[0] is a[1]
True

Why do get floats this "special" treatment and is there a way for fast initialization with a float values without producing copies?

NB: np.full(...) and a[:] display the same behavior as a[...]: the object 0.0 is cloned/its copies are created.

Edit: As @Till Hoffmann pointed out, the desired behavior for strings and integers is only the case for small integers (-5...255) and short strings (one char), because they come from a pool and there never more than one object of this kind.

>>> a[...] = 1         # or 'a'
>>> a[0] is a[1]
True
>>> a[...] = 1000      # or 'aa'
>>> a[0] is a[1]
False

It seems as if the "desired behavior" is only for types numpy cannot downcast to something, for example:

>>> class A(float): # can be downcasted to a float
>>>     pass
>>> a[...]=A()
>>> a[0] is a[1]
False

Even more, a[0] is no longer of type A but of type float.

I may not fully understand why you would want to use ellipsis, doesn't a[:] = 0.0 do the trick after initializing the empty numpy object? — Koen, Aug 08 '18 at 14:21
@Koen `a[:]=0.0` has the same behavior as `a[...]` - object `0.0` is cloned/copied. I used `[...]` only because in my actual code the array is multi-dimensional. — ead, Aug 08 '18 at 14:26

Till Hoffmann · Answer 1 · 2018-08-09T15:14:54.913

This is actually an issue with the integers rather than the floats. In particular, "small" integers are cached in python such that all of them refer back to the same memory, thus have the same id, and are thus identical when compared with the is operator. The same is not true for floats. See "is" operator behaves unexpectedly with integers for a more in-depth discussion. See https://docs.python.org/3/c-api/long.html#c.PyLong_FromLong for the official definition of "small".

Regarding the particular example of A inheriting from float, the numpy documentation states that

Note that assignments may result in changes if assigning higher types to lower types [...]

One might argue that, in the example case provided above, no assigning of a higher type to a lower type occurs because np.object should be the most general type. However, inspecting the type of the array elements, it becomes clear that the type is down-cast to a float when assigning using the [...] assignment.

a = np.empty(2, np.object)

class A(float):
    pass

a[0] = a[1] = A()
print(type(a[0]))  # <class '__main__.A'>
a[...] = A()
print(type(a[0]))  # <class 'float'>

As an aside: you probably won't be able to save much memory by storing a reference to the object of interest unless the individual objects are very large. E.g. storing a single precision floating point number is cheaper than storing a pointer to it (on a 64bit system). If your objects are indeed very large they are (probably) not down-castable to a primitive type so the problem is unlikely to arise in the first place.

You are right, I forgot about it and tested only with 0 and strings that consisted of a char, which are also cached in a pool. This explains the difference I have seen... — ead, Aug 08 '18 at 14:55
I edited my question, ints and strings were bad examples, making it kind of chameleon question, sorry for that. I hope you have also insight into the second part of the question - how to avoid the copying. — ead, Aug 08 '18 at 15:13

score 0 · Accepted Answer · answered Aug 26 '18 at 20:32

This behavior is a numpy bug: https://github.com/numpy/numpy/issues/11701

So probably one has to use a workaround until the bug is fixed. I ended up with using the naive slow version implemented/compiled with cython, here for example for one dimension and np.full:

%%cython
cimport numpy as np
import numpy as np
def cy_full(Py_ssize_t n, object obj):
    cdef np.ndarray[dtype=object] res = np.empty(n, dtype=object)
    cdef Py_ssize_t i
    for i in range(n):
        res[i]=obj
    return res

a=cy_full(5, np.nan)

a[0] is a[4]  # True as expected!

There is also no performance disadvantage compared to np.full:

%timeit cy_full(1000, np.nan)
# 8.22 µs ± 39.9 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

%timeit np.full(1000, np.nan, dtype=np.object)
# 22.3 µs ± 129 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

How to avoid additional float-object-copies while initializing a numpy object-array

2 Answers2