I'm trying to process data obtained from a csv file using csv module in python. there are about 50 columns & 401125 rows in this. I used the following code chunk to put that data into a list
csv_file_object = csv.reader(open(r'some_path\Train.csv','rb'))
header = csv_file_object.next()
data = []
for row in csv_file_object:
data.append(row)
I can get length of this list using len(data) & it returns 401125. I can even get each individual record by calling list indices. But when I try to get the size of the list by calling np.size(data) (I imported numpy as np) I get the following stack trace.
MemoryError Traceback (most recent call last) in () ----> 1 np.size(data)
C:\Python27\lib\site-packages\numpy\core\fromnumeric.pyc in size(a, axis) 2198 return a.size 2199 except AttributeError: -> 2200 return asarray(a).size 2201 else: 2202 try:
C:\Python27\lib\site-packages\numpy\core\numeric.pyc in asarray(a, dtype, order) 233 234 """ --> 235 return array(a, dtype, copy=False, order=order) 236 237 def asanyarray(a, dtype=None, order=None):
MemoryError:
I can't even divide that list into a multiple parts using list indices or convert this list into a numpy array. It give this same memory error.
how can I deal with this kind of big data sample. Is there any other way to process large data sets like this one.
I'm using ipython notebook in windows 7 professional.