the methodology will be very similar to what is described here.
- construct a map vector (i.e. sequence produced by iterator) which will map input to output. We will use a
thrust::counting_iterator
to provide an ordinal sequence 0, 1, 2,..., and then construct the map iterator using a thrust::transform_iterator
. The challenge is creating the correct arithmetic for the operation passed to the transform iterator. The code below is commented extensively to explain that.
- pass that map vector, via a
thrust::permutation_iterator
, to thrust::copy_n
. Note that sometimes good advice is to not copy data. If you are only using this "transformed view" of the data once, then just pass that permutation iterator to whatever thrust algorithm you are using once, rather than actually copying the data. If you need to use the "transformed view" of the data many times, it may be more efficient to copy it once.
Here is a worked example:
$ cat t5.cu
#include <thrust/sequence.h>
#include <thrust/iterator/permutation_iterator.h>
#include <thrust/iterator/transform_iterator.h>
#include <thrust/iterator/discard_iterator.h>
#include <thrust/device_vector.h>
#include <thrust/copy.h>
#include <iostream>
typedef int dtype; // data type
typedef int itype; // indexing type - change to size_t if L*m*s > int
const itype s = 3; // segment_width
const itype m = 2; // number of segments per sub-array
const itype L = 4; // number of sub-arrays per array
const itype w = 2; // width of group (# of sub-arrays) for segment reordering
// S1 S1 S2 S2 S1 S1 S2 S2
// 0 1 2 6 7 8 3 4 5 9 10 11 12 13 14 18 19 20 15 16 17 21 22 23
using namespace thrust::placeholders;
int main(){
// we require data that consists of L*m*s elements
// we also require L to be whole-number divisible by w
thrust::device_vector<dtype> d_data(L*m*s);
thrust::device_vector<dtype> d_result(L*m*s);
thrust::sequence(d_data.begin(), d_data.end());
// we will build up the necessary map iterator progressively
// we will start with an target_index sequence i = 0, 1, 2, 3, 4, 5, ... which defines the location in the target vector
// seg_idx (position within a segment) = i%s 0, 1, 2, 0, 1, 2, ...
// which_seg (which segment within sub-array) = (i/(w*s))%m 0, 0, 0, 0, 0, 0, 1, ...
// which_sub (which sub-array in group) = (i/s)%w 0, 0, 0, 1, 1, 1, 0, ...
// which_grp (which group in array) = i/(w*s*m) 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1 ...
// map index = which_grp*group_width + which_sub*subarray_width + which_seg*segment_width + seg_idx
// map index = (i/(w*s*m))*(w*s*m) + ((i/s)%w)*(s*m) + ((i/(w*s))%m)*s + i%s
thrust::copy_n(thrust::make_permutation_iterator(d_data.begin(), thrust::make_transform_iterator(thrust::counting_iterator<itype>(0), (_1/(w*s*m))*(w*s*m) + ((_1/s)%w)*(s*m) + ((_1/(w*s))%m)*s + _1%s)), L*m*s, d_result.begin());
thrust::copy(d_result.begin(), d_result.end(), std::ostream_iterator<dtype>(std::cout, ","));
std::cout << std::endl;
}
$ nvcc -o t5 t5.cu
$ ./t5
0,1,2,6,7,8,3,4,5,9,10,11,12,13,14,18,19,20,15,16,17,21,22,23,
$
Note that if the overall data length (L*m*s
) is greater than what can be held in a int
quantity, then the above code would have to be refactored to use size_t
instead of int
for the itype
typedef.