string matching in matlab

Question

I have two short (S with the size of 1x10) and very long (L with the size of 1x1000) strings and I am going to find the locations in L which are matched with S.

enter image description here

In this specific matching, I am just interested to match some specific strings in S (the black strings). Is there any function or method in matlab that can match some specific strings (for example string numbers of 1, 5, 9 in S)?

It seems that you're looking for either [`strfind`](http://www.mathworks.com/help/matlab/ref/strfind.html) or [`regexp`](http://www.mathworks.com/help/matlab/ref/regexp.html). There are also a lot of related questions, for example: [this](http://stackoverflow.com/questions/8061344/how-to-search-for-a-string-in-cell-array-in-matlab) and [this](http://stackoverflow.com/questions/9428746/matlab-search-cell-array-for-string-subset)... have you checked them out? — Eitan T, Apr 02 '13 at 17:10

Eitan T · Accepted Answer · 2013-04-03T10:44:30.227

1

If I understand your question correctly, you want to find substrings in L that contain the same letters (characters) as S in certain positions (let's say given by array idx). Regular expressions are ideal here, so I suggest using regexp.

In regular expressions, a dot (.) matches any character, and curly braces ({}) optionally specify the number of desired occurrences. For example, to match a string of length 6, where the second character is 'a' and the fifth is 'b', our regular expression could be any of the following syntaxes:

.a..b.
.a.{2}b.
.{1}a.{2}b.{1}

any of these is correct. So let's construct a regular expression pattern first:

in = num2cell(diff([0; idx(:); numel(S) + 1]) - 1);  %// Intervals
ch = num2cell(S(idx(:)));                            %// Matched characters
C = [in(:)'; ch(:)', {''}];
pat = sprintf('.{%d}%c', C{:});                      %// Pattern for regexp

Now all is left is to feed regexp with L and the desired pattern:

loc = regexp(L, pat)

and voila!

Example

Let's assume that:

S = 'wbzder'
L = 'gabcdexybhdef'
idx = [2 4 5]

First we build a pattern:

in = num2cell(diff([0; idx(:); numel(S) + 1]) - 1);
ch = num2cell(S(idx(:)));
C = [in(:)'; ch(:)', {''}];
pat = sprintf('.{%d}%c', C{:});

The pattern we get is:

pat =
    .{1}b.{1}d.{0}e.{1}

Obviously we can add code that beautifies this pattern into .b.de., but this is really an unnecessary optimization (regexp can handle the former just as well).

After we do:

loc = regexp(L, pat)

we get the following result:

loc =
     2     8

Seems correct.

edited Apr 03 '13 at 10:44

answered Apr 02 '13 at 17:31

Eitan T

32,660
14
72
109

tnx, but when I use matrix (numbers) it say that Error using cellstr (line 34), Input must be a string. What should I do in this case? – Nicole Apr 02 '13 at 17:43
How I can use these codes in order to find a set of strings in L that [1 5 9] arrays are matched? In other words, I need a set of strings with he length of 10 that their [1 5 9] strings are the same as S. – Nicole Apr 02 '13 at 18:42
Oh, so you want to match substrings in `L` that are similar to S in the [1 5 9] positions? – Eitan T Apr 02 '13 at 20:20
exactly. It is like pattern matching, but here I just want to use those specific locations – Nicole Apr 02 '13 at 20:55
@Nicole I've amended my answer. – Eitan T Apr 03 '13 at 08:48
It does not work for my example: (please see below) S = [1 2 NaN]; S = num2str(S); L = [2 2 1 2 2 1 1 2 2 2]; L = num2str(L); idx = [1 2]; in = num2cell(diff([0; idx(:); numel(s) + 1]) - 1); ch = num2cell(s(idx(:))); C = [in(:)'; ch(:)', {''}]; pat = sprintf('.{%d}%c', C{:}); loc = regexp(l, pat) – Nicole Apr 03 '13 at 17:21
the answer of the above codes is 7, while the correct answer is 3 and 7. – Nicole Apr 03 '13 at 17:25
@Nicole you have a lot of problems with your input. `num2str` inserts spaces between numbers, which also count as a separate character, and you have `NaN`s, which are translated into a `'NaN'` string (3 characters). So in this example, 7 is quite correct. – Eitan T Apr 04 '13 at 11:25
@Nicole if you want to fix your input to get `[3 7]`, remove the NaNs from `S` (or replace them with another value, _e.g._ `0`), and use `sprintf` instead of `num2str` to convert `S` and `L` to strings in the following way: `S = sprintf('%d', S)` and `L = sprintf('%d', L)` – Eitan T Apr 04 '13 at 11:28
it works now, but not for example for -999 (instead of 0). However, it solves my problem – Nicole Apr 04 '13 at 16:57
@Nicole Of course it won't because 999 are 3 characters. Are you interested in matching number arrays or strings?? – Eitan T Apr 04 '13 at 17:23
Actually I am going to use string matching for pattern matching. So, I am more interested in array matching. Do you have any idea? – Nicole Apr 04 '13 at 17:58
let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/27576/discussion-between-nicole-and-eitan-t) – Nicole Apr 04 '13 at 18:04

string matching in matlab

1 Answers1

Example