I have a list of list with 1,200 rows and 500,000 columns. How do I convert it into a numpy array?
I've read the solutions on Bypass "Array is too big" python error but they are not helping.
I tried to put them into a numpy array:
import random
import numpy as np
lol = [[random.uniform(0,1) for j in range(500000)] for i in range(1200)]
np.array(lol)
[Error]:
ValueError: array is too big.
Then i've tried pandas
:
import random
import pandas as pd
lol = [[random.uniform(0,1) for j in range(500000)] for i in range(1200)]
pd.lib.to_object_array(lol).astype(float)
[Error]:
ValueError: array is too big.
I've also tried hdf5 as @askewchan suggested:
import h5py
filearray = h5py.File('project.data','w')
data = filearray.create_dataset('tocluster',(len(data),len(data[0])),dtype='f')
data[...] = data
[Error]:
data[...] = data
File "/usr/lib/python2.7/dist-packages/h5py/_hl/dataset.py", line 367, in __setitem__
val = numpy.asarray(val, order='C')
File "/usr/local/lib/python2.7/dist-packages/numpy/core/numeric.py", line 460, in asarray
return array(a, dtype, copy=False, order=order)
File "/usr/lib/python2.7/dist-packages/h5py/_hl/dataset.py", line 455, in __array__
arr = numpy.empty(self.shape, dtype=self.dtype if dtype is None else dtype)
ValueError: array is too big.
This post shows that I can store a huge numpy array in disk Python: how to store a numpy multidimensional array in PyTables?. But i can't even get my list of list into a numpy array =(