The data set is in the following format: Input sample matrix X and output class vector Y such that each row in X is a sample and each of its column corresponds to a feature. Each index in Y corresponds to the respective output class for the corresponding sample in X. X can contain real numbers while Y contains positive integers.
My aim is to order the data set in terms of its class. For example
X = Y =
1 8 3 2
4 2 6 1
7 8 9 2
2 3 4 3
1 4 6 1
should be ordered and interleaved as
X = Y =
4 2 6 1
1 8 3 2
2 3 4 3
1 4 6 1
7 8 9 2
The code I've attempted seems to take a long time to run as it is based on serial execution. It is the following.
X = csvread('X.csv');
Y = csvread('Y.csv');
n = size(unique(Y),1);
m = size(X,1);
for i = 1:n
Dataset(i).X = X(Y==i,:);
Dataset(i).Y = Y(Y==i);
end
[num, ~] = hist(Y,n);
maxfreq = max(num);
NewX = [];
NewY = [];
for j = 1:maxfreq
for i = 1:n
if(j <= size(Dataset(i).X,1))
NewX = [NewX; Dataset(i).X(j,:)];
NewY = [NewY; i];
end
end
end
X = NewX;
Y = NewY;
clear NewX;
clear NewY;
csvwrite('OrderedX.csv', X);
csvwrite('OrderedY.csv', Y);
Is is possible to parallelize the above code?