I'm writing some larger (~500MB - 3GB) pieces binary data in MATLAB using the fwrite command.
I want the data to be written in a tabular format so I'm using the skip parameter. E.g. I have 2 vectors of uint8 values a = [ 1 2 3 4]; b = [5 6 7 8]
. I want the binary file to look like this 1 5 2 6 3 7 4 8
So in my code I do something similar to this (my data is more complex)
fwrite(f,a,'1*uint8',1);
fseek(f,2)
fwrite(f,b,'1*uint8',1);
But the writes are painfully slow ( 2MB/s ).
I ran the following block of code, and when I set passed in a skip count of 1 the write is approximately 300x slower.
>> f = fopen('testfile.bin', 'w');
>> d = uint8(1:500e6);
>> tic; fwrite(f,d,'1*uint8',1); toc
Elapsed time is 58.759686 seconds.
>> tic; fwrite(f,d,'1*uint8',0); toc
Elapsed time is 0.200684 seconds.
>> 58.759686/0.200684
ans =
292.7971
I could understand 2x or 4x slowdown since the you have to traverse twice as many bytes with the skip parameter set to 1 but 300x makes me think I'm doing something wrong.
Has anyone encountered this before? Is there a way to speed up this write?
Thanks!
UPDATE
I wrote the following function to format arbitrary data sets. Write speed is vastly improved (~300MB/s) for large data sets.
%
% data: A cell array of matrices. Matrices can be composed of any
% non-complex numeric data. Each entry in data is considered
% to be an independent column in the data file. Rows are indexed
% by the last column in the numeric matrix hence the count of elements
% in the last dimension of the matrix must match.
%
% e.g.
% size(data{1}) == [1,5]
% size(data{2}) == [4,5]
% size(data{3}) == [3,2,5]
%
% The data variable has 3 columns and 5 rows. Column 1 is made of scalar values
% Column 2 is made of vectors of length 4. And column 3 is made of 3 x 2
% matrices
%
%
% returns buffer: a N x M matrix of bytes where N is the number of bytes
% of each row of data, and M is the number of rows of data.
function [buffer] = makeTabularDataBuffer(data)
dataTypes = {};
dataTypesLengthBytes = [];
rowElementCounts = []; %the number of elements in each "row"
rowCount = [];
%figure out properties of tabular data
for idx = 1:length(data)
cDat = data{idx};
dimSize = size(cDat);
%ensure each column has the same number of rows.
if isempty(rowCount)
rowCount = dimSize(end);
else
if dimSize(end) ~= rowCount
throw(MException('e:e', sprintf('data column %d does not have the required number of rows (%d)\n',idx,rowCount)));
end
end
dataTypes{idx} = class(data{idx});
dataTypesLengthBytes(idx) = length(typecast(eval([dataTypes{idx},'(1)']),'uint8'));
rowElementCounts(idx) = prod(dimSize(1:end-1));
end
rowLengthBytes = sum(rowElementCounts .* dataTypesLengthBytes);
buffer = zeros(rowLengthBytes, rowCount,'uint8'); %rows of the dataset map to column in the buffer matrix because fwrite writes columnwise
bufferRowStartIdxs = cumsum([1 dataTypesLengthBytes .* rowElementCounts]);
%load data 1 column at a time into the buffer
for idx = 1:length(data)
cDat = data{idx};
columnWidthBytes = dataTypesLengthBytes(idx)*rowElementCounts(idx);
cRowIdxs = bufferRowStartIdxs(idx):(bufferRowStartIdxs(idx+1)-1);
buffer(cRowIdxs,:) = reshape(typecast(cDat(:),'uint8'),columnWidthBytes,[]);
end
end
I've done some very limited testing of the function but it appears to be working as expected. The returned buffer matrix can then be passed to fwrite without the skip argument and fwrite will write the buffer in column major order.
dat = {};
dat{1} = uint16([1 2 3 4]);
dat{2} = uint16([5 6 7 8]);
dat{3} = double([9 10 ; 11 12; 13 14; 15 16])';
buffer = makeTabularDataBuffer(dat)
buffer =
20×4 uint8 matrix
1 2 3 4
0 0 0 0
5 6 7 8
0 0 0 0
0 0 0 0
0 0 0 0
0 0 0 0
0 0 0 0
0 0 0 0
0 0 0 0
34 38 42 46
64 64 64 64
0 0 0 0
0 0 0 0
0 0 0 0
0 0 0 0
0 0 0 0
0 0 0 0
36 40 44 48
64 64 64 64