0

I want to extract the numbers enveloped in 't' tags. I have used the following commands in MATLAB:

str ='<t abc>1.3</t><t efg>32.3</t>';
[tokens] = regexpi(str,  '<t.*>(\d*\.\d*)</t>', 'tokens');
celldisp(tokens)

The output only shows the last match 32.3. I'm not sure why regexpi is returning only the last match?

ubaabd
  • 435
  • 2
  • 13
  • the expression: `(\d*\.\d*)` works. This to me is confusing as your expression for the first tag is greedy and returns only the first match, where mine is lazy and returns both matches... – Paolo Jun 03 '18 at 20:44

1 Answers1

0

To capture what you need, you should use this pattern: <t.*?>(.*?)<\/t>

str ='<t abc>1.3</t><t efg>32.3</t>';
[tokens] = regexpi(str,  '<t.*?>(.*?)<\/t>', 'tokens')
celldisp(tokens)

The result is what you are looking for:

tokens{1}{1} =

1.3

tokens{2}{1} =

32.3
Zander
  • 167
  • 1
  • 10
  • 2
    Can you please explain why your solution is working? What is wrong with my understanding and code? Thanks. – ubaabd Jun 03 '18 at 10:28
  • @zander The expression is incorrect as you are not matching digits only. @ubaabd asked for numbers in the tag. For input `str ='12.3a32ab.3'` it returns `{'12.3a'}` and `{'32ab.3'}` – Paolo Jun 03 '18 at 11:04
  • @pkpkpk run the second `regexpi` on it to get what you need. – Zander Jun 03 '18 at 11:29
  • @ubaabd There is nothing wrong with your pattern. For some reason in MATLAB, this combination often gives you the last match rather than all matches. – Zander Jun 03 '18 at 11:31
  • @Zander I was referring to the expression you posted – Paolo Jun 03 '18 at 11:37