0

I am trying to convert image matrices to list, in order to send them as a request to an inference API.

However, as most of my files are 100+ pages(I do pdf2img for getting images), the .tolist() operation takes a lot of time (Implementation below:)

page_images = [page.image.tolist() for page in pages]

This single line takes forever to run for 100+ page files.

Is there something that can be done to improve the speed?

Dawny33
  • 10,543
  • 21
  • 82
  • 134
  • I don't know the answer to your question as posted, but what are you doing that can't be done as an ndarray? – gph Dec 21 '20 at 13:43
  • @gph I am converting the data to a list in order to send it in an API request (as ndarrays aren't json serializable), and converting it back to ndarray inside the API – Dawny33 Dec 21 '20 at 13:46
  • Gotcha. Could you use an alternative protocol like ftp so you can serialize to something other than json? – gph Dec 21 '20 at 13:50
  • @gph Just realized that while replying to your comment :D. I'm trying it out right now :) – Dawny33 Dec 21 '20 at 14:01
  • 2
    if you implement the server side yourself, you could just POST binary data (picture encoded as jpeg or png) rather than json – Christoph Rackwitz Dec 21 '20 at 21:33
  • 1
    You can alternatively try to send the images using base64 encoding inside of json like so: https://stackoverflow.com/q/1443158/502144 – fdermishin Dec 22 '20 at 18:24

0 Answers0