I'm going to assume that your input array is a cell array of characters like so:
inputArray = {'Apple', 'Banana', 'Cherry', 'Dragonfruit', 'Apple', 'Cherry'};
You can convert the above into a numeric array by using the unique
function's third output. What's great about this is that unique
assigns a unique ID in sorted order, and so if you have a cell array of characters, it respects a lexicographical ordering of the characters.
Next, declare a matrix of zeros (like you did above) then use sub2ind
to index into the matrix and set the values to 1.
Something like this. Bear in mind that I initialized the output slightly differently. It's a trick I learned to allocate a matrix of zeroes that is quite fast. See here: Faster way to initialize arrays via empty matrix multiplication? (Matlab)
inputArray = {'Apple', 'Banana', 'Cherry', 'Dragonfruit', 'Apple', 'Cherry'};
[~,~,inputNum] = unique(inputArray);
inputNum = inputNum.'; %// To make compatible in dimensions
binMatrix(numel(inputArray), max(inputNum)) = 0;
binMatrix(sub2ind(size(binMatrix), 1:numel(inputArray), inputNum)) = 1;
Another method would be to create a sparse
logical array where we set the right row and column positions to be 1, then use this to index into our zeroes array and set the values accordingly.
Something like:
inputArray = {'Apple', 'Banana', 'Cherry', 'Dragonfruit', 'Apple', 'Cherry'};
[~,~,inputNum] = unique(inputArray);
inputNum = inputNum.'; %// To make compatible in dimensions
binMatrix = sparse(1:numel(inputArray), inputNum, 1, numel(inputArray), max(inputNum));
binMatrix = full(binMatrix);
Let's put this all together in a timing script. I've incorporated the two methods above, plus your old method, plus Divakar's (only the first method) and brodroll's (very ingenious btw) method. For Divakar's and brodroll's method, I have also used unique
with the third output as your original inquiry had capital letters which confused as all. Using the third output can easily convert their previous methods to your new specifications.
BTW, your example and your code are mismatched. Your example has it set so the each column is an index but it's each row. For the timing tests, I'm going to transpose your result.I'm running MATLAB R2013a on Mac OS X 10.10.3 with 16 GB of RAM and an Intel i7 2.3 GHz processor. So:
clear all;
close all;
%// Generate dictionary
chars = {'Apple', 'Banana', 'Cherry', 'Dragonfruit'};
rng(123);
%// Generate 50000 random words
v = randi(numel(chars), 50000, 1);
inputArray = chars(v);
[~,~,inputNum] = unique(inputArray);
inputNum = inputNum.'; %// To make compatible in dimensions
%// Timing #1 - sub2ind
tic;
binMatrix(numel(inputArray), max(inputNum)) = 0;
binMatrix(sub2ind(size(binMatrix), 1:numel(inputArray), inputNum)) = 1;
t = toc;
clear binMatrix;
%// Timing #2 - sparse
tic;
binMatrix = sparse(1:numel(inputArray), inputNum, 1, numel(inputArray), max(inputNum));
binMatrix = full(binMatrix);
t2 = toc;
clear binMatrix;
%// Timing #3 - ismember and for
tic;
binMatrix = zeros(numel(inputArray), numel(chars));
for i = 1: size(binMatrix,1)
binMatrix(i,:) = ismember(chars, inputArray(i));
end
t3 = toc;
%// Timing #4 - bsxfun
clear binMatrix;
tic;
binMatrix = bsxfun(@eq,inputNum',unique(inputNum)); %// Changed to make dimensions match
t4 = toc;
clear binMatrix;
%// Timing #5 - raw sub2ind
tic;
binMatrix(numel(inputArray), max(inputNum)) = 0;
binMatrix( (inputNum-1)*size(binMatrix,1) + [1:numel(inputArray)] ) = 1;
t5 = toc;
fprintf('Timing using sub2ind: %f seconds\n', t);
fprintf('Timing using sparse: %f seconds\n', t2);
fprintf('Timing using ismember and loop: %f seconds\n', t3);
fprintf('Timing using bsxfun: %f seconds\n', t4);
fprintf('Timing using raw sub2ind: %f seconds\n', t5);
We get:
Timing using sub2ind: 0.004223 seconds
Timing using sparse: 0.004252 seconds
Timing using ismember and loop: 2.771389 seconds
Timing using bsxfun: 0.020739 seconds
Timing using raw sub2ind: 0.000773 seconds
In terms of rank:
- Raw
sub2ind
sub2ind
sparse
bsxfun
- OP's method