1

I just read this:

Get the status of a std::future

Since the functionality of Concurrency::completion_future appears to mimick std::future I thought I could do something similar, but this relatively simple example fails:

#include <assert.h>
#include <chrono>
#include <iostream>
#include <amp.h>

int main()
{
    using namespace Concurrency;
    int big = 1000000; // this should take a while to send back to the host
    array_view<int> av(big);

    parallel_for_each(extent<1>(big), [=](index<1> idx) restrict(amp)
    {
        av[idx] = idx[0];
    });
    int i = 0;
    completion_future future = av.synchronize_async();

    // this should be false; how could it instantly sent back so much data?
    bool const gpuFinished = future.wait_for(std::chrono::seconds(0)) == std::future_status::ready;

    assert(!gpuFinished); // FAIL! why?

    future.wait();

    system("pause");
}

Why would that assert fail?

Community
  • 1
  • 1
quant
  • 21,507
  • 32
  • 115
  • 211

2 Answers2

4

The behavior observed in OP is correct.

array_view<int> av(big) creates an array_view without data source, while av.synchronize_async() synchronizes modifications to the data source. Therefore for array_view without data source it is by definition no-op. By extension it is also not forcing the execution of the preceding parallel_for_each.

If the intention is to synchronize the data to the CPU memory, in this case it needs to be requested explicitly with av.synchronize_to_async(accelerator(accelerator::cpu_accelerator).default_view). Of course the returned completion_future becomes ready only when the preceding parallel_for_each and (optional) copy operation finish.

Replacing the former synchronization call with the latter makes the assertion successful, keeping in mind it may still fail (by design) on systems with CPU shared memory, or in some rare timings.

1

Disclaimer: I'm not an expert in AMP.

AFAIK, array_view doesn't represent anything by itself. It is just a view you should tie to something. So your code, basically, doesn't make sense to me. You don't have any backend memory on CPU with which you need to synchronize.

Try the following code:

#include <assert.h>
#include <chrono>
#include <iostream>
#include <amp.h>
#include <numeric>

int main()
{
    using namespace Concurrency;
    using namespace std;
    int big = 100000000; // this should take a while to send back to the host
    vector<int> vec(big);
    iota(begin(vec), end(vec), 0);
    array_view<int, 1> av(big, vec);

    parallel_for_each(Concurrency::extent<1>(big), [=](index<1> idx) restrict(amp)
    {
        av[idx] = av[idx] * av[idx];
    });
    int i = 0;
    completion_future future = av.synchronize_async();

    // this should be false; how could it instantly sent back so much data?
    bool const gpuFinished = future.wait_for(std::chrono::seconds(0)) == std::future_status::ready;

    assert(!gpuFinished); // FAIL! why?

    future.wait();
    std::cout << vec[5];
}

It's just a modification of yours which works as expected.

ixSci
  • 13,100
  • 5
  • 45
  • 79
  • That's very interesting, this might be a bug in the new feature then. The new AMP supports array_views without explicitely declared datasources (see http://blogs.msdn.com/b/nativeconcurrency/archive/2013/06/28/what-s-new-for-c-amp-in-visual-studio-2013.aspx), but your version obviously works.. I'll let MS know. – quant Nov 15 '13 at 10:15
  • I have posted the bug report. Please +1 if you can reproduce: https://connect.microsoft.com/VisualStudio/feedback/details/808655/completion-future-object-does-not-seem-to-sync-asynchronously-when-array-view-is-declared-without-data-source# – quant Nov 15 '13 at 10:20
  • I don't believe there is a bug here. You can access array_view after operation and get its values. They use something under the hood but it doesn't mean it needs some long running transfer. You didn't provide any source so with what part are you trying to sync? Maybe it gets value on demand or use some other sophisticated algorithm. Read [this](http://blogs.msdn.com/b/nativeconcurrency/archive/2013/07/11/shared-memory-support-in-c-amp-array-view.aspx) also – ixSci Nov 15 '13 at 11:48
  • That article states with regards to array views created without a data source "*if such an array_view is first accessed on the CPU, it will have the same behavior as an array_view created over a CPU memory data source*". So I think this behaviour is buggy. The Sync operation should not block. – quant Nov 15 '13 at 11:52
  • Remember that the entire kernel should run aynchronously, so even if the copy operation is trivial the future status should still be not ready, as the kernel operation should take a long time. – quant Nov 15 '13 at 12:04
  • What I mean is that I don't see why they are obliged to sync anything. For example, when you construct view from some CPU-bound memory you have to sync it between and they(AMP) should do it because you can access the source. On the other hand you have array_view without source hence you can't access it without the view itself. So they are free to use whatever optimization or store they want. And synchronization in this case may be absolutely no-op. It is good that you created a bug report, though. We will know what is really going on, hopefully. I'd ask on MSDN forums too. – ixSci Nov 15 '13 at 14:32
  • Ah I see what you mean. My understanding from an earlier question (see http://stackoverflow.com/questions/19830470/will-array-view-synchronize-asynch-wait-for-parallel-for-each-completion) is that the sync operation not only synchronizes the array (a no-op in this case as you said) but also first *waits* for the underlying `parallel_for_each` kernel to complete (certainly not a no-op). My thinking is that AMP is failing to submit the `parallel_for_each` kernel asynchronously, but we'll see what MS say... – quant Nov 16 '13 at 00:12