Force numpy to create array of objects

Question

I have an array:

x = np.array([[1, 2, 3], [4, 5, 6]])

and I want to create another array of shape=(1, 1) and dtype=np.object whose only element is x.

I've tried this code:

a = np.array([[x]], dtype=np.object)

but it produces an array of shape (1, 1, 2, 3).

Of course I can do:

a = np.zeros(shape=(1, 1), dtype=np.object)
a[0, 0] = x

but I want the solution to be easily scalable to greater a shapes, like:

[[x, x], [x, x]]

without having to run for loops over all the indices.

Any suggestions how this could be achieved?

UPD1

The arrays may be different, as in:

x = np.array([[1, 2, 3], [4, 5, 6]])
y = np.array([[7, 8, 9], [0, 1, 2]])
u = np.array([[3, 4, 5], [6, 7, 8]])
v = np.array([[9, 0, 1], [2, 3, 4]])
[[x, y], [u, v]]

They may also be of different shapes, but for that case a simple np.array([[x, y], [u, v]]) constructor works fine

UPD2

I really want a solution that works with arbitrary x, y, u, v shapes, not necessarily all the same.

score 6 · Answer 1 · answered Mar 02 '18 at 07:38

6

a = np.empty(shape=(2, 2), dtype=np.object)
a.fill(x)

answered Mar 02 '18 at 07:38

wim

338,267
99
616
750

Thanks for this one. Sorry, I used the same-x array example for the sake of brevity, but in fact those can be different: `[[x, y], [u, v]]`. The original problem for me was that the result depended on whether all the input arrays have the same shape or not. – SiLiKhon Mar 02 '18 at 07:50
This `fill` puts the same pointer to `x` in all 4 slots. It has the danger as the list `[mutable_object]*4` replication. – hpaulj Mar 05 '18 at 22:07

score 4 · Answer 2 · answered Mar 02 '18 at 07:37

4

Found a solution myself:

a=np.zeros(shape=(2, 2), dtype=np.object)
a[:] = [[x, x], [x, x]]

answered Mar 02 '18 at 07:37

SiLiKhon

583
4
15

Paul Panzer · Accepted Answer · 2018-03-02T10:05:29.050

Here is a pretty general method: It works with nested lists, lists of lists of arrays - regardless of whether the shapes of these arrays are different or equal. It also works when the data come clumped together in one single array, which is in fact the trickiest case. (Other methods posted so far will not work in this case.)

Let's start with the difficult case, one big array:

# create example
# pick outer shape and inner shape
>>> osh, ish = (2, 3), (2, 5)
# total shape
>>> tsh = (*osh, *ish)
# make data
>>> data = np.arange(np.prod(tsh)).reshape(tsh)
>>>
# recalculate inner shape to cater for different inner shapes
# this will return the consensus bit of all inner shapes
>>> ish = np.shape(data)[len(osh):]
>>> 
# block them
>>> data_blocked = np.frompyfunc(np.reshape(data, (-1, *ish)).__getitem__, 1, 1)(range(np.prod(osh))).reshape(osh)
>>> 
# admire
>>> data_blocked
array([[array([[0, 1, 2, 3, 4],
       [5, 6, 7, 8, 9]]),
        array([[10, 11, 12, 13, 14],
       [15, 16, 17, 18, 19]]),
        array([[20, 21, 22, 23, 24],
       [25, 26, 27, 28, 29]])],
       [array([[30, 31, 32, 33, 34],
       [35, 36, 37, 38, 39]]),
        array([[40, 41, 42, 43, 44],
       [45, 46, 47, 48, 49]]),
        array([[50, 51, 52, 53, 54],
       [55, 56, 57, 58, 59]])]], dtype=object)

Using OP's example which is a list of lists of arrays:

>>> x = np.array([[1, 2, 3], [4, 5, 6]])
>>> y = np.array([[7, 8, 9], [0, 1, 2]])
>>> u = np.array([[3, 4, 5], [6, 7, 8]])
>>> v = np.array([[9, 0, 1], [2, 3, 4]])
>>> data = [[x, y], [u, v]]
>>> 
>>> osh = (2,2)
>>> ish = np.shape(data)[len(osh):]
>>> 
>>> data_blocked = np.frompyfunc(np.reshape(data, (-1, *ish)).__getitem__, 1, 1)(range(np.prod(osh))).reshape(osh)
>>> data_blocked
array([[array([[1, 2, 3],
       [4, 5, 6]]),
        array([[7, 8, 9],
       [0, 1, 2]])],
       [array([[3, 4, 5],
       [6, 7, 8]]),
        array([[9, 0, 1],
       [2, 3, 4]])]], dtype=object)

And an example with different shape subarrays (note the v.T):

>>> data = [[x, y], [u, v.T]]
>>> 
>>> osh = (2,2)
>>> ish = np.shape(data)[len(osh):]
>>> data_blocked = np.frompyfunc(np.reshape(data, (-1, *ish)).__getitem__, 1, 1)(range(np.prod(osh))).reshape(osh)>>> data_blocked
array([[array([[1, 2, 3],
       [4, 5, 6]]),
        array([[7, 8, 9],
       [0, 1, 2]])],
       [array([[3, 4, 5],
       [6, 7, 8]]),
        array([[9, 2],
       [0, 3],
       [1, 4]])]], dtype=object)

Thanks for the answer, but it's rather important for me that the solution works for arbitrary `x, y, u, v` shapes, not necessarily all the same. Apologies for not stating it clearly in the OP. — SiLiKhon, Mar 02 '18 at 09:10
I've written an alternative that uses `ndindex` instead. I think it's a little easier to understand. But what really matters is whether one is more general than the other. — hpaulj, Mar 05 '18 at 06:17
Another object array case: https://stackoverflow.com/a/49226113/901925, complicated by the fact that the user wants a 2d array of tuples. Our methods produce an array of arrays (because they first turn the nested list into an array). — hpaulj, Mar 11 '18 at 23:22

hpaulj · Answer 4 · 2018-03-05T06:15:22.870

@PaulPanzer's use of np.frompyfunc is clever, but all that reshaping and use of __getitem__ makes it hard to understand:

Separating the function creation from application might help:

func = np.frompyfunc(np.reshape(data, (-1, *ish)).__getitem__, 1, 1)
newarr = func(range(np.prod(osh))).reshape(osh)

This highlights the separation between the ish dimensions and the osh ones.

I also suspect a lambda function could substitute for the __getitem__.

This works because frompyfunc returns an object dtype array. np.vectorize also uses frompyfunc but lets us specify a different otype. But both pass a scalar to the function, which is why Paul's approach uses a flattened range and getitem. np.vectorize with a signature lets us pass an array to the function, but it uses a ndindex iteration instead of frompyfunc.

Inspired by that, here's a np.empty plus fill method - but with ndindex as the iterator:

In [385]: >>> osh, ish = (2, 3), (2, 5)
     ...: >>> tsh = (*osh, *ish)
     ...: >>> data = np.arange(np.prod(tsh)).reshape(tsh)
     ...: >>> ish = np.shape(data)[len(osh):]
     ...: 
In [386]: tsh
Out[386]: (2, 3, 2, 5)
In [387]: ish
Out[387]: (2, 5)
In [388]: osh
Out[388]: (2, 3)
In [389]: res = np.empty(osh, object)
In [390]: for idx in np.ndindex(osh):
     ...:     res[idx] = data[idx]
     ...:     
In [391]: res
Out[391]: 
array([[array([[0, 1, 2, 3, 4],
       [5, 6, 7, 8, 9]]),
       ....
       [55, 56, 57, 58, 59]])]], dtype=object)

For the second example:

In [399]: arr = np.array(data)
In [400]: arr.shape
Out[400]: (2, 2, 2, 3)
In [401]: res = np.empty(osh, object)
In [402]: for idx in np.ndindex(osh):
     ...:     res[idx] = arr[idx]

In the third case, np.array(data) already creates the desired (2,2) object dtype array. This res create and fill still works, even though it produces the same thing.

Speed isn't very different (though this example is small)

In [415]: timeit data_blocked = np.frompyfunc(np.reshape(data, (-1, *ish)).__get
     ...: item__, 1, 1)(range(np.prod(osh))).reshape(osh)
49.8 µs ± 172 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
In [416]: %%timeit
     ...: arr = np.array(data)
     ...: res = np.empty(osh, object)
     ...: for idx in np.ndindex(osh): res[idx] = arr[idx]
     ...: 
54.7 µs ± 68.7 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

Note that when data is a (nested) list, np.reshape(data, (-1, *ish) is , effectively, np.array(data).reshape(-1 *ish). That list has to be first turned into an array.

Besides speed, it would interesting to see whether one approach is more general than the other. Are there cases that one handles, but the other can't?

Performance-wise, the old stick-a-None-in-the-first-cell method looks rather good `tmp = list(np.reshape(data, (-1, *ish))); swap = tmp[0]; tmp[0] = None; result = np.array(tmp); result[0] = swap; result = result.reshape(osh)` is more than twice as fast as `frompyfunc` on the first example. — Paul Panzer, Mar 05 '18 at 07:20
[Here](https://stackoverflow.com/q/49117632/7207392) is one that probably works with yours but not with mine. (It works in principle but not with the stuff I did to make it general.) — Paul Panzer, Mar 05 '18 at 19:22
@PaulPanzer, mine fails on that ((10,3),(10,8)) case because it can't make `ndarray`. But with a simple list we don't need to use `ndindex` to iterate. `enumerate` is sufficient. — hpaulj, Mar 05 '18 at 21:58

Force numpy to create array of objects

4 Answers4

Linked

Related