0

I have a response from a web service that if the number of data items is large, I want to split it up into smaller requests and perform the request and subsequent parsing of that request in parallel. Essentially, while the first request is parsing the data, the subsequent requests should be fetching it.

It seems that there are a number of approaches to doing this, and I am wondering if futures is appropriate in this case. I hear some comments that futures should not be used for IO, and arguments going the other way.

Effectively, I am trying to do this:

void Service::GetData(const Defn &defn) {
    // Split up the request into chunks if the  list is large
    size_t chunk_size = CONFIG.GetInt("Limits","BatchSize");
    if(chunk_size == 0) {
        auto response = GetResponse(defn);
        Parse(defn, *response);
    } else {
        std::vector<std::future<std::unique_ptr<Response>>> futures;
        for(int batch_num = 0; batch_num < (std::ceil(cardinality / chunk_size)); batch_num++) {
            futures.emplace_back(std::async(std::launch::async, &Service::GetResponse, defn, chunk_size, batch_num * chunk_size));
        }
        for(auto&& future : futures ) {
            Parse(defn, *future.get());
        }
    }
}

std::unique_ptr<Response> Service::GetResponse(const Defn &defn, size_t top, size_t skip) {
    // Do request and return response
}

However, I am getting an error "error C2064: term does not evaluate to a function taking 3 arguments" and I am not sure why. Do futures disallow putting them into containers such as a vector?

If so, should I be approaching this differently, or is there a different way to capture a list of futures? I.e. Do I have to use a packaged task?

Ideally, I suppose, this should be tied closer to the number of cores rather than just arbitrarily breaking up the response in chunks and then trying to create a thread for each chunk.

user3072517
  • 513
  • 1
  • 7
  • 21
  • Is `Service::GetResponse()` static member function? – ikh Feb 11 '15 at 03:09
  • No. It is just a regular class function. – user3072517 Feb 11 '15 at 03:11
  • Then, why don't you pass the object in `std::async`? `GetResponse` has 4 arguments - **`this`**, `defn`, `top` and `skip`. – ikh Feb 11 '15 at 03:12
  • DOH! Missed that. Thanks! To some degree that was part of my question, but another part was about futures. Should futures be used here? Or is it more appropriate to create a thread pool for IO. I can't get a straight answer on when not to use futures for IO. – user3072517 Feb 11 '15 at 03:16
  • I've written an answer > – ikh Feb 11 '15 at 03:34

1 Answers1

1
futures.emplace_back(
    std::async(std::launch::async,
        &Service::GetResponse, pService, defn, chunk_size, batch_num * chunk_size)
//                             ^^^^^^^^
    );

Since GetResponse isn't static member function, you should give the object as parameter.


I don't know what you do exactly, so I can't give you specific advices >o<

However, if you interested in asynchronous task with future, I would introduce you boost.asio. It's an asynchronous I/O library (yes, ASynchronous I/O) which easily cooperate with std::future or boost::future. (See this my question)

In your code, I think Parse() also can go into future.

futures.emplace_back(
    std::async(std::launch::async,
        [&] { GetResponse(...); Parse(...); }
    )
);

If Parse don't need to run in the same thread or run sequently, I think it's better - you can run several Parse and several GetResponse in parallel.

Community
  • 1
  • 1
ikh
  • 10,119
  • 1
  • 31
  • 70
  • I have marked this as an answer (actually pService I just used this) because it addresses the fundamental issue with compilation. I am not allowed to incorporate boost, so I was trying to ensure the use of futures was valid and there weren't concurrency issues to worry about. – user3072517 Feb 11 '15 at 03:42
  • @user3072517 By the way, your web request I/O can be more efficiently processed in parallel? I guess just single-thread I/O and multi-thread parsing is enough >o – ikh Feb 11 '15 at 03:49
  • Yes, the individual request is truly single-threaded but broken down by batch size, so that parsing can happen while the other batches are being requested. The batch threshold is high enough that the parse time nearly matches the request time so I don't have to make it work in a truly multi-threaded fashion. It would, though, probably be better that way. Thanks for the help! – user3072517 Feb 11 '15 at 04:00