Why is indexing into a dataset array so slow? A peak into the dataset.subsref function shows that all the columns of the dataset are stored in a cell array. However, cell indexing is much, much faster than dataset indexing, which is just indexing into a cell array under the hood. My guess is that this has to do with some overhead with MATLAB OOP. Any ideas on how to speed this up?
%% Using R2011a, PCWIN64
feature accel off; % turn off JIT
dat = (1:1e6)';
dat2 = repmat({'abc'}, 1e6, 1);
celldat = {dat dat2};
ds = dataset(dat, dat2);
N = 1e2;
tic;
for j = 1:N
tmp = celldat{2};
end
toc;
tic;
for j = 1:N
tmp2 = ds.dat2; % 2.778sec spent on line 262 of dataset.subsref
end
toc;
feature accel on; % turn JIT back on
Elapsed time is 0.000165 seconds.
Elapsed time is 2.778995 seconds.
EDIT: I've updated the example to be more like the problem I'm seeing. A huge amount of time is spent on line 262 of dataset.subsref - "b = a.data{varIndex};". It's very strange to me since it is a simple cell dereference. I'm wondering if there is a OOP trick that will allow me to index into "a.data" without the strange overhead.
EDIT2: As per Andrew's suggestion, I've submitted this as a bug to MatWorks. Will update if I hear anything from them.
EDIT3: Matlab responded and said they are aware of the problem now and will fix it in a future release. They noted that the problem is specific to cell arrays, and to try to avoid them if possible.