0

I am trying to store pixel data of 30227(1024 x 1024) images together by concatenating them in a list to form my training data. But I am receiving Out of memory error while doing so in my Jupyter notebook. Below are the lines of code that I have used.

 train_data = []
 mm_scaler  = MinMaxScaler()
 for file_id in data['patientId']:
     file_name = train_images_path+"\\"+file_id.strip()+".dcm"
     if os.path.exists(file_name):
          image_data = mm_scaler.fit_transform(pydicom.dcmread(file_name).pixel_array)
          train_data.append(image_data)

Is there any other way to store this data together which I can use later for training my model ? Please help me in this

Sarvagya Dubey
  • 435
  • 1
  • 7
  • 22

1 Answers1

0

Out of memory error comes when there is a limit for a system you can have a look at this here

For storing data you can take help from this link and this

I dont have any .dcm file to replicate the error but i will suggest to perform minmax scaling after importing images in array them you can perform it as matrix operations rather than array operation and it will take less time also.

deepak sen
  • 437
  • 4
  • 13
  • Hi there , Thanks for your answer. Actually the problem is storing the pixel array together in a list for 30227 files the list being in runtime memory grows in size and explodes. So the final option with me would be to use chunked data only – Sarvagya Dubey May 07 '20 at 13:18
  • Nope it didn't workout as expected, I tried storing all of the data in .h5 format it took 35 hours to store 30227 images, and on retireving them it wanted 60 GB RAM, I resized it to 320 x 320 and it still was requiring 12 GB of RAM to load it. So a much better way instead of chunking the data was implementing a Data Generator, it took some time to research but it finally helped us a great deal. – Sarvagya Dubey Jun 10 '20 at 13:22
  • Cool you can post correct answer to your question such that it can be helpful for others. – deepak sen Jun 10 '20 at 13:43