2

I have a list of arrays.

[array([   2.,    4.,    6.,    8.,   10.,   12.,   14.,   16.,   18.,
         20.,   22.,   24.,   26.,   28.,   30.,   32.,   34.,   36.,
         38.,   40.,   42.,   44.,   46.,   48.,   50.,   52.,   54.,
         56.,   58.,   60.,   62.,   64.,   66.,   68.,   70.,   72.,
         74.,   76.,   78.,   80.,   82.,   84.,   86.,   88.,   90.,
         92.,   94.,   96.,   98.,  100.]), array([   4.,    8.,   12.,   16.,   20.,   24.,   28.,   32.,   36.,
         40.,   44.,   48.,   52.,   56.,   60.,   64.,   68.,   72.,
         76.,   80.,   84.,   88.,   92.,   96.,  100.]), array([  8.,  16.,  24.,  32.,  40.,  48.,  56.,  64.,  72.,  80.,  88.,
        96.])]

I have tried np.vstack to stack the list array by array. But because the arrays are not of equal size (ie, different number of columns), I received this error:

ValueError: all the input array dimensions except for the concatenation axis must match exactly

I do not want to concatenate them because I want to store the rows for future computations. How can I stack them row by row if the rows are ragged?

Edit: Is it possible to concatenate along a variable axis for this purpose?

  • In `numpy` 'stack' is the same as 'concatenate', with name variations for style. All refer to joining array into a larger array along some axis. – hpaulj Apr 01 '17 at 05:15
  • What's wrong your list of arrays? There is an object dtype array, but for most purposes is it's functionally the same as a list. `hdf5` has arrays with a variable axis, `numpy` does not. – hpaulj Apr 01 '17 at 05:18
  • The size of my actual dataset is several thousand `1 x n` arrays, each consisting of floats instead of integers. If I can stack each array row by row, then it will be easier to do a future calculation per each row. I was also hoping to be able to read them easier on a print screen, but that's not as important. –  Apr 01 '17 at 05:27
  • I will look into `hdf5`, thanks. –  Apr 01 '17 at 05:30
  • 1
    A ragged `h5py` example: http://stackoverflow.com/questions/42658438/storing-multidimensional-variable-length-array-with-h5py/; notice that when loaded into `numpy` it becomes an object dtype array. You can't perform 'row' calculations in an object array as well as you can in a regular 2d array. And don't expect any speed improvements compared to list comprehensions. – hpaulj Apr 01 '17 at 05:37

1 Answers1

2

You could use a pandas DataFrame:

import pandas as pd
data = pd.DataFrame([pd.Series(i) for i in yourlist])

The result will be something like this:

enter image description here

The drawback is that you will have to deal with the missing values while doing your calculations.

andre
  • 163
  • 8