2

Currently, I am creating .npz compressed files for storing large NumPy arrays. So every time I need to load the array from file and since it's a frequent process, I was thinking to store the NumPy array in the database. I am using the PostgreSQL database.

  • https://www.postgresql.org/docs/9.1/arrays.html - you may see if this has what you need. – Mercury Aug 29 '20 at 20:42
  • Trying to store to Postgres array type would require considerable transformation in either direction. I would say storing as *.npy to a bytea field would be a better solution. I have never done this, so this is theory at this point. FYI, *.npz files are not compressed [npz](https://numpy.org/doc/stable/reference/generated/numpy.savez.html) "The .npz file format is a zipped archive of files named after the variables they contain. The archive is not compressed ..." – Adrian Klaver Aug 29 '20 at 21:24

1 Answers1

3

You could do this with a bytea column (it can store arbitrary binary). You can use pickle.dumps to turn your numpy array into a binary string and insert into postgres however you like. You can then go select that data and use pickle.loads to get your array back. Here's an answer to a similar question: https://stackoverflow.com/a/57644761/7386185

Depending on how large the array is, you might want to consider some kind of blob storage like amazon S3.

If you're frequently accessing this data and this is a production environment, you might want to consider keeping this data in-memory. If your array is large enough that you can't keep it around in memory too long, then you should consider if your application will allow for streaming of your data in batches or a buffer.

Arlo Clarke
  • 315
  • 2
  • 10