0

I would like to run a program on my laptop (Gazebo simulator) and send a stream of image data to a GCE instance, where it will be run through an object-detection network and sent back to my laptop in near real-time. Is such a set-up possible?

My best idea right now is, for each image:

  1. Save the image as a JPEG on my personal machine
  2. Stream the JPEG to a Cloud Storage bucket
  3. Access the storage bucket from my GCE instance and transfer the file to the instance
  4. In my python script, convert the JPEG image to numpy array and run through the object detection network
  5. Save the detection results in a text file and transfer to the Cloud Storage bucket
  6. Access the storage bucket from my laptop and download the detection results file
  7. Convert the detection results file to a numpy array for further processing

This seems like a lot of steps, and I am curious if there are ways to speed it up, such as reducing the number of save and load operations or transporting the image in a better format.

Digil
  • 742
  • 4
  • 12
Jon S
  • 23
  • 4

1 Answers1

1

If your question is "is it possible to set up such a system and do those actions in real time?" then I think the answer is yes I think so. If your question is "how can I reduce the number of steps in doing the above" then I am not sure I can help and will defer to one of the experts on here and can't wait to hear the answer!

I have implemented a system that I think is similar to what you describe for research of Forex trading algorithms (e.g. upload data to storage from my laptop, compute engine workers pull the data and work on it, post results back to storage and I download the compiled results from my laptop).

I used the Google PubSub architecture - apologies if you have already read up on this. It allows near-realtime messaging between programs. For example you can have code looping on your laptop that scans a folder that looks out for new images. When they appear it automatically uploads the files to a bucket and once theyre in the bucket it can send a message to the instance(s) telling them that there are new files there to process, or you can use the "change notification" feature of Google Storage buckets. The instances can do the work, send the results back to the storage and send a notification to the code running on your laptop that work is done and results are available for pick-up.

Note that I set this up for my project above and encountered problems to the point that I gave up with PubSub. The reason was that the Python Client Library for PubSub only supports 'asynchronous' message pulls, which seems to mean that the subscribers will pull multiple messages from the queue and process them in parallel. There are some features to help manage 'flow control' of messages built into the API, but even with them implemented I couldn't get it to work the way I wanted. For my particular application I wanted to process everything in order, one file at a time because it was important to me that I'm clear what the instance is doing and the order its doing it in. There are several threads on google search, StackOverflow and Google groups that discuss workarounds for this using queues, classes, allocating specific tasks for specific instances, etc which I tried, but even these presented problems for me. Some of these links are:

Run synchronous pull in PubSub using Python client API and pubsub problems pulling one message at a time and there are plenty more if you would like them!

You may find that if the processing of an image is relatively quick, order isn't too important and you don't mind an instance working on multiple things in parallel that my problems don't really apply to your case.

FYI, I ended up just making a simple loop on my 'worker instances' that scans the 'task list' bucket every 30 seconds or whatever to look for new files to process, but obviously this isn't quite the real-time approach that you were originally looking for. Good luck!

Paul
  • 528
  • 5
  • 17
  • Awesome. I had not read up on Google PubSub, and it sounds exactly what I am looking for! Note that order is important for my application, but subsequent pulls are not dependent on each other, so passing an identifier flag and reordering upon return should do the trick. – Jon S Apr 27 '18 at 16:59
  • Good luck Jon! There are some good tutorials and sample codes out there that should get you started. It's also possible to configure auto-scaling based on the number of unacknowledged messages in PubSub using stackdriver so that if you have a huge backlog of images to process it will create new instances for you to help out with the work. Happy coding! – Paul Apr 27 '18 at 17:28
  • @Paul. I want to use pub sub with auto-scaling of instances. Do you have any possible examples doing the same?. If not how can I connect my subscription with my instance template in managed instance groups.?. Is it something with meta data. I am not able to understand. How will I connect my task/sub with instance template? So that it can spin up new instance. Do you know of any code sample ? – khushbu Sep 25 '18 at 08:03
  • Hi @khushbu I am sorry for the slow response. This is not something I ever did, so sadly I am unable to share examples with you, however I did see in my research of pubsub that there is mention of scaling in the documentation. As I understand it, you have two choices. One is to make an instance group, and set the autoscaling based on group utilisation in the traditional manner. If you want the number of instances to be scaled based on the length of unprocessed messages in the topic/subscription then you may need to use stackdriver metrics and this is not something I have done – Paul Sep 29 '18 at 03:40
  • @khushbu there are some links [here](https://cloudplatform.googleblog.com/2018/03/queue-based-scaling-made-easy-with-new-stackdriver-per-group-metrics.html) and [here](https://stackoverflow.com/questions/35475082/google-pubsub-counting-messages-in-topic) and some other pages about it with a Google search, but it looks like some effort is required and I abandoned the idea of using pubsub for my project before I got this far. Im sorry I can't be more helpful - good luck! – Paul Sep 29 '18 at 03:42