I'm building a simple C++ server where I receive an image via a POST request, send it to a GPU for processing, and once I have the results from the GPU, send back a response.
To be able to handle many simultaneous connections (and to learn something new), I'm using Boost::asio, following the server4 example (link) that uses boost::asio::coroutine. The issue I'm running into is how to send data from the coroutine to the GPU while not blocking the coroutine.
The GPU is most efficient when it can process a batch of requests together. But even processing one-request-at-a-time (cannot be context-switched as a CPU and memory IO is the bottleneck), it must process a complete request before starting on the next one. This means I need to queue the requests from the coroutines, and signal the coroutine somehow when the GPU processing is complete.
Been looking through the boost::asio reference but nothing is jumping out at me. In general, how are boost asio coroutines used where a response cannot be generated immediately (e.g. may take ~500ms). How is this typically done?