3

So for a matrix, we have methods like numpy.flatten()

np.array([[1,2,3],[4,5,6],[7,8,9]]).flatten()

gives [1,2,3,4,5,6,7,8,9]

what if I wanted to get from np.array([[1,2,3],[4,5,6],7]) to [1,2,3,4,5,6,7]?
Is there an existing function that performs something like that?

AsheKetchum
  • 1,098
  • 3
  • 14
  • 29
  • Why do you even have `np.array([[1,2,3],[4,5,6],7])`? NumPy is not at all designed for that kind of thing. – user2357112 Apr 26 '17 at 17:55
  • what are you referring to when you say 'that kind of thing'? array of lists of unequal length? – AsheKetchum Apr 26 '17 at 17:57
  • 1
    It's not just lists of unequal length but a mix of lists and scalar values. You might want to look at [this](http://stackoverflow.com/questions/2158395/flatten-an-irregular-list-of-lists). – Psidom Apr 26 '17 at 17:58
  • @user2357112 why exactly? I'm working on something that includes a square matrix and another vector (I guess) that holds some meta data about that array, in a single `np.array()`. Can I not return a view on the square matrix based on some values in my vector without the overhead of converting a list to an array and bookkeeping the meta data in a separate list? – roganjosh Apr 26 '17 at 18:01
  • Use [existing solutions to flatten a list](http://stackoverflow.com/questions/10823877/what-is-the-fastest-way-to-flatten-arbitrarily-nested-lists-in-python), replacing `list` with `np.ndarray` in lines that check the type of each element (like `isinstance(i, (list,tuple))`) (unless there's a faster numpy solution) – Stuart Apr 26 '17 at 18:01
  • 1
    @user2357112 nm, already addressed perfectly in the posted answer. – roganjosh Apr 26 '17 at 18:05
  • 2
    The core concept of a NumPy array is a rigid multidimensional grid of numbers. Almost everything in NumPy is built around that concept. If you try to make something that isn't a rigid multidimensional grid, such as the array shown here, NumPy has to make an object array instead. Those are incompatible with all sorts of things you'd usually take for granted in NumPy; for the most part, they're worse than just using lists. – user2357112 Apr 26 '17 at 18:13
  • @user2357112 so I should try to avoid creating objects arrays like that? – AsheKetchum Apr 26 '17 at 18:15
  • 1
    Sometimes object arrays are useful (there are plenty of SO examples), but a plain list is usually better than a 1d object array. – hpaulj Apr 26 '17 at 18:18
  • 1
    @user2357112 so my approach is not lost, I just need to pack my meta data with zeros to get the dimensions consistent. I need to look at this in more detail; is the limitation on the `numpy` side or `C` itself here? It's not immediately clear to me why `numpy` should default to `object` here but I really only know Python and I've approached my problem from that mindset. I'm just looking for a starting basis for my research into why this is :) – roganjosh Apr 26 '17 at 18:21
  • 2
    @roganjosh: I'd just use two arrays. As for why this happens, NumPy depends on a consistent shape and strides to be able to represent an array's data efficiently in memory, to be able to take views of an array, and to be able to index it without incurring tons of extra indirection and type checking. Without that, it's reduced to storing opaque object pointers like a list would, and all the things that depended on the regular representation break. – user2357112 Apr 26 '17 at 18:36
  • 1
    An object array, like a list, contains pointers to objects elsewhere in memory. So both can hold 'anything', including lists of varying size. Operations like `reshape` and `transpose` work with object arrays, but the application of math is hit-or-miss and slower when it does work. – hpaulj Apr 26 '17 at 22:07

3 Answers3

5

With uneven lists, the array is a object dtype, (and 1d, so flatten doesn't change it)

In [96]: arr=np.array([[1,2,3],[4,5,6],7])
In [97]: arr
Out[97]: array([[1, 2, 3], [4, 5, 6], 7], dtype=object)
In [98]: arr.sum()
...
TypeError: can only concatenate list (not "int") to list

The 7 element is giving problems. If I change that to a list:

In [99]: arr=np.array([[1,2,3],[4,5,6],[7]])
In [100]: arr.sum()
Out[100]: [1, 2, 3, 4, 5, 6, 7]

I'm using a trick here. The elements of the array lists, and for lists [1,2,3]+[4,5] is concatenate.

The basic point is that an object array is not a 2d array. It is, in many ways, more like a list of lists.

chain

The best list flattener is chain

In [104]: list(itertools.chain(*arr))
Out[104]: [1, 2, 3, 4, 5, 6, 7]

though it too will choke on the integer 7 version.

concatenate and hstack

If the array is a list of lists (not the original mix of lists and scalar) then np.concatenate works. It iterates on the object just as though it were a list.

With the mixed original list concatenate does not work, but hstack does

In [178]: arr=np.array([[1,2,3],[4,5,6],7])
In [179]: np.concatenate(arr)
...
ValueError: all the input arrays must have same number of dimensions
In [180]: np.hstack(arr)
Out[180]: array([1, 2, 3, 4, 5, 6, 7])

That's because hstack first iterates though the list and makes sure all elements are atleast_1d. This extra iteration makes it more robust, but at a cost in processing speed.

time tests

In [170]: big1=arr.repeat(1000)
In [171]: timeit big1.sum()
10 loops, best of 3: 31.6 ms per loop
In [172]: timeit list(itertools.chain(*big1))
1000 loops, best of 3: 433 µs per loop
In [173]: timeit np.concatenate(big1)
100 loops, best of 3: 5.05 ms per loop

double the size

In [174]: big1=arr.repeat(2000)
In [175]: timeit big1.sum()
10 loops, best of 3: 128 ms per loop
In [176]: timeit list(itertools.chain(*big1))
1000 loops, best of 3: 803 µs per loop
In [177]: timeit np.concatenate(big1)
100 loops, best of 3: 9.93 ms per loop
In [182]: timeit np.hstack(big1)    # the extra iteration hurts hstack speed
10 loops, best of 3: 43.1 ms per loop

The sum is quadratic in size

res=[]
for e in bigarr: 
   res += e

res grows with the number of e, so each iteration step is more expensive.

chain times the best.

hpaulj
  • 221,503
  • 14
  • 230
  • 353
  • 2
    This looks like straight hacks. Is this `sum` defined to work this way in `numpy`? – AsheKetchum Apr 26 '17 at 18:03
  • 2
    Watch out - if the sublists do happen to have the same length, the `array` call will produce a real 2D array, and the `sum` call will produce an integer sum instead of concatenating lists. (Also, this is a quadratic-time way to do concatenation even when it works.) – user2357112 Apr 26 '17 at 18:05
  • 1
    With object dtype, `numpy` is asking each element to do its own version of `+`. For numbers that is the usual summation, but for lists that's defined as concatenation. A object array of strings would also do concatenation. But with mixed object types this approach runs into problems (numbers and lists don't do `+` in the same way). – hpaulj Apr 26 '17 at 18:15
  • `numpy` has a special case for non-empty `object` arrays in `ufunc.reduce`, which means that unlike the usual `np.add.reduce(arr)`, which computes `0 + arr[0] + arr[1] + ...`, the `0` is omitted. This strikes me as not necessarily a good thing - so I would argue that this _is_ straight hacks. – Eric Apr 26 '17 at 20:24
  • As a result, you're in for a surprise when your object array of lists is in fact empty, so you try to flatten `a = np.array([], object)` using `sum()` and get `False` instead of `[]` – Eric Apr 26 '17 at 20:26
  • This doesn't solve the problem... This solves a related problem – Girardi Jun 02 '22 at 05:46
  • @Girardi, that's for the OP to decide :) Looking at this fresh after 5 years, my first though was use `np.hstack`. – hpaulj Jun 02 '22 at 05:57
3

You can write custom flatten function using yield:

def flatten(arr):
    for i in arr:
        try:
            yield from flatten(i)
        except TypeError:
            yield i

Usage example:

>>> myarr = np.array([[1,2,3],[4,5,6],7])
>>> newarr = list(flatten(myarr))
>>> newarr
[1, 2, 3, 4, 5, 6, 7]
Scratch'N'Purr
  • 9,959
  • 2
  • 35
  • 51
2

You can use apply_along_axis here

>>> arr = np.array([[1,2,3],[4,5,6],[7]])
>>> np.apply_along_axis(np.concatenate, 0, arr)
array([1, 2, 3, 4, 5, 6, 7])

As a bonus, this is not quadratic in the number of lists either.

Eric
  • 95,302
  • 53
  • 242
  • 374
  • 1
    If you are going the `concatenate` route you don't need `apply_along_axis` (even through it's your 'baby':) `np.concatenate(arr)` works; it iterates on the argument, whether it's a list or array. I don't get the `quadratic` business. – hpaulj Apr 26 '17 at 21:34
  • 1
    Indeed, a straight `concatenate` would do the trick for the 1d case. `sum` is quadratic, because each time it adds the ith element, it allocates an array of length proportional to i. This is the same reason that python advises you use `"".join(...)` not `sum(..., "")` – Eric Apr 27 '17 at 09:32