As you may want to do some processing after this, consider using:
B = varfun(@(x) {x}, Data, 'GroupingVariables', 'ID');
You can either use this to partition the values into groups like presented above, or directly apply some function like mean
, if you change @(x) {x}
to @mean
. This should be the clearest solution, yet it won't give you any speed gains.
You might however get a little speed gain, if you don't use tables, but simply use arrays. There instead of 'GroupingVariables'
, you would use accumarray
.
If your Data.ID
s are positive integers already you don't need any preprocessing step (If they aren't use: [~,~,newID] = unique(ID)
) and can just use:
accumarray(Data.ID, Data.VAR, [], @(x) {x})
If your table only has two variables, this will be sufficient. If you are dealing with more than one variable, you will have to use something similar:
accumarray(Data.ID, 1:size(Data,1) ,[], @(I) {Data(I,:)})
Both of these will likely shuffle the internal ordering of each cell-entry. If you don't want this, use this stable version of accumarray
.
As the table data-structure has some overhead, this will possibly be even faster, if you don't use the Data
table to access the values, but the arrays themselves:
VAR1 = rand(100000,1);
VAR2 = rand(100000,1);
ID = repmat(randperm(50).',2000,1);
VARsPartitioned = accumarray(ID, 1:numel(ID) ,[], @(I) {[VAR1(I,:), VAR2(I,:)]});
For a million rows and 5000 different IDs, I get these results:
arrayfun: ~30 seconds
varfun: ~30 seconds
accumarray using table: ~3 seconds
accumarray using arrays: ~0.3 seconds
PS: You can also use something like @mean
or @std
directly with accumarray
without the need to group the variables in a first step.