16

I'm using pickle for saving on disk my NLP classifier built with the TextBlob library.

I'm using pickle after a lot of searches related to this question. At the moment I'm working locally and I have no problem loading the pickle file (which is 1.5Gb) with my i7 and 16gb RAM machine. But the idea is that my program, in the future, has to run on my server which only has 512Mb RAM installed.

Can pickle handle such a large file or will I face memory issues?

On my server I've got Python 3.5 installed and it is a Linux server (not sure which distribution).

I'm asking because at the moment I can't access my server, so I can't just try and find out what happens, but at the same time I'm doubtful if I can keep this approach or I have to find other solutions.

Community
  • 1
  • 1
Nico
  • 6,259
  • 4
  • 24
  • 40

2 Answers2

10

Unfortunately this is difficult to accurately answer without testing it on your machine.

Here are some initial thoughts:

  1. There is no inherent size limit that the Pickle module enforces, but you're pushing the boundaries of its intended use. It's not designed for individual large objects. However, you since you're using Python 3.5, you will be able to take advantage of PEP 3154 which adds better support for large objects. You should specify pickle.HIGHEST_PROTOCOL when you dump your data.

  2. You will likely have a large performance hit because you're trying to deal with an object that is 3x the size of your memory. Your system will probably start swapping, and possibly even thrashing. RAM is so cheap these days, bumping it up to at least 2GB should help significantly.

  3. To handle the swapping, make sure you have enough swap space available (a large swap partition if you're on Linux, or enough space for the swap file on your primary partition on Windows).

  4. As pal sch's comment shows, Pickle is not very friendly to RAM consumption during the pickling process, so you may have to deal with Python trying to get even more memory from the OS than the 1.5GB we may expect for your object.

Given these considerations, I don't expect it to work out very well for you. I'd strongly suggest upgrading the RAM on your target machine to make this work.

Community
  • 1
  • 1
skrrgwasme
  • 9,358
  • 11
  • 54
  • 84
2

I don't see how you could load an object into RAM that exceeds the RAM. i.e. bytes(num_bytes_greater_than_ram) will always raise an MemoryError.

Jashandeep Sohi
  • 4,903
  • 2
  • 23
  • 25