2

I am doing string processing in Matlab and I usually use cell arrays to store individual words in the text

Example:

a = {'this', 'is', 'an', 'array', 'of', 'strings'}

For searching for the word 'of' in this array I loop through the array and check each individual element against my word. This method does not scale since if I get a large dataset my array a will grow large and looping through elements is not wise. I am wondering if there is any more smart way, perhaps a better native data structure in Matlab, that can help me run this faster?

Andrey Rubshtein
  • 20,795
  • 11
  • 69
  • 104
Mark
  • 10,754
  • 20
  • 60
  • 81

1 Answers1

3

A map container is one option. I don't know what specific sort of string processing you intend to do, but here's an example for how you can store each string as a key which is associated with a vector of index positions of that word in a cell array:

a = {'this', 'is', 'an', 'array', 'of', 'strings', 'this', 'is'};

strMap = containers.Map();  %# Create container
for index = 1:numel(a)      %# Loop over words to add
    word = a{index};
    if strMap.isKey(word)
        strMap(word) = [strMap(word) index];  %# Add to an existing key
    else
        strMap(word) = index;  %# Make a new key
    end
end

You could then get the index positions of a word:

>> indices = strMap('this')

indices =

     1     7    %# Cells 1 and 7 contain 'this'

Or check if a word exists in the cell array (i.e. if it is a key):

>> strMap.isKey('and')

ans =

     0    %# 'and' is not present in the cell array
gnovice
  • 125,304
  • 15
  • 256
  • 359