How can I stream data directly into tensorflow as opposed to reading files on disc?

Question

Every tensorflow tutorial I've been able to find so far works by first loading the training/validation/test images into memory and then processing them. Does anyone have a guide or recommendations for streaming images and labels as input into tensorflow? I have a lot of images stored on a different server and I would like to stream those images into tensorflow as opposed to saving the images directly on my machine.

Thank you!

score 5 · Answer 1 · edited May 23 '17 at 12:17

Tensorflow does have Queues, which support streaming so you don't have to load the full data in memory. But yes, they only support reading from files on the same server by default. The real problem you have is that, you want to load in memory data from some other server. I can think of following ways to do this:

Expose your images using a REST service. Write your own queueing mechanism in python and read this data (using Urllib or something) and feed it to Tensorflow placeholders.
Instead of using python queues (as above) you can use Tensorflow queues as well (See this answer), although it's slighly more complicated. The advantage will be, tensorflow queues can use multiple cores giving you better performance, compared to normal python multi-threaded queues.
Use a network mount to fool your OS into believing the data is on the same machine.

Also, remember when using this sort of distributed setup, you will always incur network overhead (time taken for images to be transferred from Server 1 to 2), which can slow your training by a lot. To counteract this, you'll have to build a multi-threaded queueing mechanism with fetch-execute overlap, which is a lot of effort. An easier option IMO is to just copy the data into your training machine.

Having a file is not easier if your training set is not memory resident. — dfrankow, Sep 20 '17 at 13:17
https://www.tensorflow.org/versions/r0.9/how_tos/threading_and_queues/index.html#threading-and-queues link is dead :( — jtlz2, Sep 03 '18 at 09:55

score 0 · Answer 2 · answered Jun 25 '17 at 04:59

0

You can use the sockets package in Python to transfer a batch of images, and labels from your server to your host. Your graph needs to be defined to take a placeholder as input. The placeholder must be compatible with your batch size.

answered Jun 25 '17 at 04:59

CraigDavid

1,046
1
12
26

How can I stream data directly into tensorflow as opposed to reading files on disc?

2 Answers2

Linked