1

I am looking for the best way to send large Numpy arrays (composed mainly of images) via Flask.

For now, I am now doing something like this:

Server side:

np.save(matrix_path, my_array)
return send_file(matrix_path+'.npy') 

Client side:

with open('test_temp', 'wb') as f:
    f.write(r.content)
my_array = np.load('test_temp')

But the .npy file is very large so it takes too long.

I thought about using h5py but as the images have different size (array.shape = (200,)), I cannot use h5py (create a dataset for each image would be too long).

Does anyone get an idea of how to optimize this?

David Jones
  • 4,766
  • 3
  • 32
  • 45
A. Attia
  • 1,630
  • 3
  • 20
  • 29
  • can you send compressed png images instead? also you're spending time writing the file to disk when presumably you only need to send it to the user. Maybe just save to a buffer and send the buffer to keep it all in memory. – Aaron Mar 22 '19 at 13:54
  • I cannot send compressed png because I am sending arrays with images but not only, with text data for instance. Indeed, I am wasting time writing the file to disk but this time is really small compared to the time of sending the images (50x less). But it's definitely the next optimization – A. Attia Mar 22 '19 at 14:02
  • If the bottleneck is truly just the time taken to send the data you either need to find a way to compress the data first and send less or get a faster connection between the server and client. You should also have your wsgi server configured for `X-Sendfile`. – Aaron Mar 22 '19 at 14:10
  • Yes, the goal of the question is to find the best way to compress my data. What do you mean by wsgi server configured for `X-sendfile` ? – A. Attia Mar 22 '19 at 14:30
  • how are you running your flask server? [This](http://flask.pocoo.org/docs/1.0/deploying/#deployment) page links to lots of different options for setting up your wsgi server. If you are using the builtin development server: `$flask run` or `$python -m flask run` or `if __name__ == "__main__": app.run()` this feature is [not supported](https://stackoverflow.com/a/17435621/3220135) – Aaron Mar 22 '19 at 14:46

1 Answers1

5

As the comments section is really just starting to become an answer in and of itself, I'll write it all out here.

EDIT: numpy has a built-in way to compress multiple arrays into a file to neatly package them up for sending. This combined with using a buffer rather than a file on disk is probably the quickest and easiest way to gain some speed. Here is a quick example of numpy.savez_compressed saving some data to a buffer, and this question shows sending a buffer using flask.send_file

import numpy as np
import io

myarray_1 = np.arange(10) #dummy data
myarray_2 = np.eye(5)

buf = io.BytesIO() #create our buffer
#pass the buffer as you would an open file object
np.savez_compressed(buf, myarray_1, myarray_2, #etc...
         )

buf.seek(0) #This simulates closing the file and re-opening it.
            #  Otherwise the cursor will already be at the end of the
            #  file when flask tries to read the contents, and it will
            #  think the file is empty.

#flask.sendfile(buf)

#client receives buf
npzfile = np.load(buf)
print(npzfile['arr_0']) #default names are given unless you use keywords to name your arrays
print(npzfile['arr_1']) #  such as: np.savez(buf, x = myarray_1, y = myarray_2 ... (see the docs)

There are 3 quick ways to gain some speed in sending files.

  1. don't write to disk: this one is pretty simple, just use a buffer to store the data before passing it to flask.send_file()
  2. compress the data: once you have a buffer of binary data, there are many options for compression, but zlib is part of the standard python distribution. If your arrays are images (or even if they aren't), png compression is lossless and can sometimes provide better compression than zlib on its own. Scipy is depreciating it's builtin imread and imwrite so you should use imageio.imwrite now.
  3. Get a higher performance server to actually do the file sending. The builtin development server that gets called when you call app.run() or invoke your app via flask directly ($flask run or $python -m flask run) does not support the X-Sendfile feature. This is one reason to run flask behind something like Apache or Nginx. Unfortunately this isn't implemented in the same way for each server, and may require a file in the filesystem (though you could possibly use an in-memory file if the OS supports it). This will be a case of rtfm for whatever deployment you choose.
Aaron
  • 10,133
  • 1
  • 24
  • 40
  • Thanks for the answer. Do you have examples for the two first quick ways? Indeed, for 1., the data is not sent via the buffer. For 2, I don't find any easy solution to send a complex array via zlib (I have a numpy array with text, images, other int values). One of the solution is also to reduce the precision of the images (using float16 instead of float32 or 64) – A. Attia Mar 25 '19 at 12:07
  • @A.Attia the data in a numpy array is available as bytes using the `array.tobytes` this is the raw data underneath the data structure, so you'd need to find a way to pass the shape and datatype so it can be decoded on the other end using `np.frombuffer`. Bytes are the datatype you would write to a binary file (or buffer), as well as the native datatype of the `zlib` library. – Aaron Mar 25 '19 at 13:00
  • @A.Attia I have revised my recommendation: using the built-in numpy file saving and loading tools supports compression already (you learn something every day), and doesn't require you to deal with the underlying data buffers, and keeping track of bytes etc.. python's `io` library is how you deal with all things "buffer", and works very much like a normal file object, without the disk writes. please see the example – Aaron Mar 25 '19 at 14:26