I want to tag a number of objects using a CUDA kernel function. The main purpose is to find those objects which were not tagged by any thread. I want ro use competitive write to achieve this, i.e. each thread write TRUE to a array in which every location corresponds to an object, during this there may be several threads write to the same location at the same time. The initial value of this array is FALSE. If it remains FALSE after the operation, I would thus know the object hasn't been tagged by any thread.
Is my idea a good choice? Or should I use some other features like atomicAdd() ? I do not need to know exactly how many threads had wrote.