96

It appears that I have data in the format of a list of NumPy arrays (type() = np.ndarray):

[array([[ 0.00353654]]), array([[ 0.00353654]]), array([[ 0.00353654]]), 
array([[ 0.00353654]]), array([[ 0.00353654]]), array([[ 0.00353654]]), 
array([[ 0.00353654]]), array([[ 0.00353654]]), array([[ 0.00353654]]), 
array([[ 0.00353654]]), array([[ 0.00353654]]), array([[ 0.00353654]]),
array([[ 0.00353654]])]

I am trying to put this into a polyfit function:

m1 = np.polyfit(x, y, deg=2)

However, it returns the error: TypeError: expected 1D vector for x

I assume I need to flatten my data into something like:

[0.00353654, 0.00353654, 0.00353654, 0.00353654, 0.00353654, 0.00353654 ...]

I have tried a list comprehension which usually works on lists of lists, but this as expected has not worked:

[val for sublist in risks for val in sublist]

What would be the best way to do this?

Jerry Zhang
  • 1,352
  • 1
  • 9
  • 20

5 Answers5

110

You could use numpy.concatenate, which as the name suggests, basically concatenates all the elements of such an input list into a single NumPy array, like so -

import numpy as np
out = np.concatenate(input_list).ravel()

If you wish the final output to be a list, you can extend the solution, like so -

out = np.concatenate(input_list).ravel().tolist()

Sample run -

In [24]: input_list
Out[24]: 
[array([[ 0.00353654]]),
 array([[ 0.00353654]]),
 array([[ 0.00353654]]),
 array([[ 0.00353654]]),
 array([[ 0.00353654]]),
 array([[ 0.00353654]]),
 array([[ 0.00353654]]),
 array([[ 0.00353654]]),
 array([[ 0.00353654]]),
 array([[ 0.00353654]]),
 array([[ 0.00353654]]),
 array([[ 0.00353654]]),
 array([[ 0.00353654]])]

In [25]: np.concatenate(input_list).ravel()
Out[25]: 
array([ 0.00353654,  0.00353654,  0.00353654,  0.00353654,  0.00353654,
        0.00353654,  0.00353654,  0.00353654,  0.00353654,  0.00353654,
        0.00353654,  0.00353654,  0.00353654])

Convert to list -

In [26]: np.concatenate(input_list).ravel().tolist()
Out[26]: 
[0.00353654,
 0.00353654,
 0.00353654,
 0.00353654,
 0.00353654,
 0.00353654,
 0.00353654,
 0.00353654,
 0.00353654,
 0.00353654,
 0.00353654,
 0.00353654,
 0.00353654]
Divakar
  • 218,885
  • 19
  • 262
  • 358
  • 1
    by doing so, I get `ValueError: all the input array dimensions except for the concatenation axis must match exactly` – Athena Feb 11 '18 at 12:09
  • 2
    @Athena Post a new question please. It's not clear what exactly is the data format. – Divakar Feb 11 '18 at 12:36
  • @Athena I think I had the same issue: it's because the arrays in the list have different shapes. I was able to get a flattened array using: `np.concatenate(input_list, axis=None).ravel()` – user2561747 Apr 15 '22 at 05:23
18

Can also be done by

np.array(list_of_arrays).flatten().tolist()

resulting in

[0.00353654, 0.00353654, 0.00353654, 0.00353654, 0.00353654, 0.00353654, 0.00353654, 0.00353654, 0.00353654, 0.00353654, 0.00353654, 0.00353654, 0.00353654]

Update

As @aydow points out in the comments, using numpy.ndarray.ravel can be faster if one doesn't care about getting a copy or a view

np.array(list_of_arrays).ravel()

Although, according to docs

When a view is desired in as many cases as possible, arr.reshape(-1) may be preferable.

In other words

np.array(list_of_arrays).reshape(-1)

The initial suggestion of mine was to use numpy.ndarray.flatten that returns a copy every time which affects performance.

Let's now see how the time complexity of the above-listed solutions compares using perfplot package for a setup similar to the one of the OP

import perfplot

perfplot.show(
    setup=lambda n: np.random.rand(n, 2),
    kernels=[lambda a: a.ravel(),
             lambda a: a.flatten(),
             lambda a: a.reshape(-1)],
    labels=['ravel', 'flatten', 'reshape'],
    n_range=[2**k for k in range(16)],
    xlabel='N')

enter image description here

Here flatten demonstrates piecewise linear complexity which can be reasonably explained by it making a copy of the initial array compare to constant complexities of ravel and reshape that return a view.

It's also worth noting that, quite predictably, converting the outputs .tolist() evens out the performance of all three to equally linear.

ayorgo
  • 2,803
  • 2
  • 25
  • 35
  • `np.flatten` works, but it's worth noting that it's significantly slower than `np.ravel`. this difference gets worse as the `array` length increases – aydow Jun 12 '19 at 00:27
  • @aydow hmm, how so? `np.flatten` is indeed slower but not significantly. I just `%%timeit` both on `list(map(np.array, np.random.rand(1_000_000, 10)))` and `np.concatenate(list_of_arrays).ravel()` takes `290 ms ± 2.49 ms` against `np.array(list_of_arrays).flatten()`'s `446 ms ± 26.5 ms` with both performing seemingly instantaneously without `%%timeit` on my laptop. – ayorgo Jun 12 '19 at 11:34
  • hi @ayorgo, i'm deviating slightly from the OP question. i'm assuming an `np.array` of `np.array`s (which pertained to my own question) rather than a `list` of `np.array`s. using just `np.ravel` takes `249 ns ± 8.43 ns` while using just `np.flatten` takes `25.4 ms ± 244 µs`!! adding `np.concatenate` and `np.array` slows it down to the numbers you've mentioned. apologies for not specifying this in my initial comment – aydow Jun 13 '19 at 00:47
  • @aydow haha, indeed! What I believe makes such a difference in performance is that `np.flatten` always returns a copy unlike 'np.ravel' (https://stackoverflow.com/a/28930580/4755520). The interesting thing also is that the accepted answer doesn't need to use `np.concatenate`. Simply converting to `np.array` and `.ravel()` would suffice. – ayorgo Jun 13 '19 at 06:32
5

Another simple approach would be to use numpy.hstack() followed by removing the singleton dimension using squeeze() as in:

In [61]: np.hstack(list_of_arrs).squeeze()
Out[61]: 
array([0.00353654, 0.00353654, 0.00353654, 0.00353654, 0.00353654,
       0.00353654, 0.00353654, 0.00353654, 0.00353654, 0.00353654,
       0.00353654, 0.00353654, 0.00353654])
kmario23
  • 57,311
  • 13
  • 161
  • 150
5

Another way using itertools for flattening the array:

import itertools

# Recreating array from question
a = [np.array([[0.00353654]])] * 13

# Make an iterator to yield items of the flattened list and create a list from that iterator
flattened = list(itertools.chain.from_iterable(a))

This solution should be very fast, see https://stackoverflow.com/a/408281/5993892 for more explanation.

If the resulting data structure should be a numpy array instead, use numpy.fromiter() to exhaust the iterator into an array:

# Make an iterator to yield items of the flattened list and create a numpy array from that iterator
flattened_array = np.fromiter(itertools.chain.from_iterable(a), float)

Docs for itertools.chain.from_iterable(): https://docs.python.org/3/library/itertools.html#itertools.chain.from_iterable

Docs for numpy.fromiter(): https://docs.scipy.org/doc/numpy/reference/generated/numpy.fromiter.html

Tim Skov Jacobsen
  • 3,583
  • 4
  • 26
  • 23
3

I came across this same issue and found a solution that combines 1-D numpy arrays of variable length:

np.column_stack(input_list).ravel()

See numpy.column_stack for more info.

Example with variable-length arrays with your example data:

In [135]: input_list
Out[135]: 
[array([[ 0.00353654,  0.00353654]]),
 array([[ 0.00353654]]),
 array([[ 0.00353654]]),
 array([[ 0.00353654,  0.00353654,  0.00353654]])]

In [136]: [i.size for i in input_list]    # variable size arrays
Out[136]: [2, 1, 1, 3]

In [137]: np.column_stack(input_list).ravel()
Out[137]: 
array([ 0.00353654,  0.00353654,  0.00353654,  0.00353654,  0.00353654,
        0.00353654,  0.00353654])

Note: Only tested on Python 2.7.12

zsatter14
  • 53
  • 6
  • I tried this and got `ValueError: all the input array dimensions except for the concatenation axis must match exactly` :( – Shir May 02 '19 at 09:20
  • I was able to make it work using `np.hstack` instead of `np.column_stack`. I think this is because my arrays are 1d, and I didn't read the original question carefully enough. Thanks anyway :) – Shir May 02 '19 at 09:30