I've problem with memory during converting big list of 2D elements into the 3D numpy array. I'm using CoLab enviroment. I'm doing deep learning project connected with medical images (.nii), CNN network. These images are float type (because of standarization). I'm loading images (one channel) into the memory as list, then I divide it into small pieces (11x11 resolution). As a result I have list of 11650348 - 11x11 images.
Get sequences. Memory info:
Gen RAM Free: 12.8 GB | Proc size: 733.4 MB
GPU RAM Free: 15079MB | Used: 0MB | Util 0% | Total 15079MB
get seqences...
Time: 109.60107789899996
Gen RAM Free: 11.4 GB | Proc size: 2.8 GB
GPU RAM Free: 15079MB | Used: 0MB | Util 0% | Total 15079MB
[INFO] data matrix in list of 11507902 images
Now I'm using np.array method to convert list into array.
Memory info:
Gen RAM Free: 11.8 GB | Proc size: 2.1 GB
GPU RAM Free: 15079MB | Used: 0MB | Util 0% | Total 15079MB
Coverting....
Gen RAM Free: 6.7 GB | Proc size: 7.3 GB
GPU RAM Free: 15079MB | Used: 0MB | Util 0% | Total 15079MB
Shape of our training data: (11650348, 11, 11, 1) SPLIT! See code below.
As you can see, I've lost a lot of memory. Why it happens?
I've try to use np.asarray, np.array with parameter copy. It didin't work.
Code responsible for dividing original image.
def get_parts(image, segmented):
T2 = image[0]
seg = segmented[0]
labels = []
val = [];
window_width = 5
zlen, ylen, xlen = T2.shape
nda = np.zeros((240, 240))
for x in range(0, xlen):
for y in range(0, ylen):
for z in range(0, zlen):
if T2[z, y, x] != 0:
xbegin = x - window_width
xend = x + window_width + 1
ybegin = y - window_width
yend = y + window_width + 1
val.append(T2[z, ybegin:yend, xbegin:xend])
labels.append(seg[z, y, x])
#np_array_01 = np.asarray(val)
#np_array_02 = np.asarray(labels)
return val, labels
Get values
for x in range(0, length):
data, labels = get_parts(T2_images[x], segmented[x])
uber_dane.extend(data)
uber_label.extend(labels)
I'm transforming it in that way.
X_train, X_test, y_train, y_test = train_test_split(uber_dane, uber_label,test_size=0.2, random_state=0)
#LABELS
y_train = np.array(y_train)
y_test= np.array(y_test)
y_train = np.expand_dims(y_train, axis=3)
y_test = np.expand_dims(y_test, axis=3)
y_train = to_categorical(y_train)
y_test = to_categorical(y_test)
#DATA - HERE IS A PROBLEM
X_train = np.array(X_train)
X_test= np.array(X_test)
print(sys.getsizeof(X_train))
print(sys.getsizeof(X_test))
X_train = np.expand_dims(X_train, axis=4)
X_test = np.expand_dims(X_test, axis=4)
What do you think about it? Maybe I'm doing something wrong. Array should take less memory than list :/ I did some searches through stackoverflow and the Internet, but it did not help. I could not help myself.
I hope, you will have some good ideas :D
UPDATE 08-06-2019
I've ran my code in pyCharm, different error:
X_train = np.array(uber_dane) ValueError: array is too big;
arr.size * arr.dtype.itemsize
is larger than the maximum possible size.
I've got: Python 3.6.3 (v3.6.3:2c5fed8, Oct 3 2017, 17:26:49) [MSC v.1900 32 bit (Intel)] on win32 So python is trying to allocate more than 3GB.
lmfit minimize fails with ValueError: array is too big
What do you think?