All:
As the subject states, I'm running into an issue with Grep Perl Non-Greedy Scope RegEx Matching on an Empty String.
[Note: For the purposes of this example assume that the 'title' can be a complex, alpha-numeric, special-character, multi-word, space-separated, string.]
# echo "<span class=\"title\"></span><span class=\"price\">0.25</span><span class=\"title\">Banana</span><span class=\"price\">0.10</span><span class=\"title\">Grape</span><span class=\"price\">0.05</span>" | /opt/bin/grep -ioP "<span class=\"title\">(.+?)</span><span class=\"price\">(.+?)</span>" | sed "s/<span class=\"title\">//g; s/<span class=\"price\">/|/g; s/<\/span>//g;"
|0.25Banana|0.10
Grape|0.05
As you can see, the first 'title' match is empty, but the grep perl non-greedy scope regex (.+?)
still matches.
Shouldn't the first 'title' match be ignored? What am I missing?
Thank you for your assistance.
UPDATE:
Negating the lessthan-sign ([^<]+?)
is a good solution with the original, basic example. However, I'm finding that it runs into problems when more data is introduced.
I've attempted to expand the match to include additional trailing tags, but the regex appears to still be failing with that change as well.
# echo "<span class=\"title\"></span></div></div><span class=\"price\">0.25</span><span class=\"title\">Banana</span></div></a><span class=\"price\">0.10</span><span class=\"title\">Grape</span></div></a><span class=\"price\">0.05</span>" | grep -ioP "<span class=\"title\">(.+?)</span></div></a><span class=\"price\">(.+?)</span>" | sed "s/<span class=\"title\">//g; s/<span class=\"price\">/|/g; s/<\/span>//g; s/<\/div>//g; s/<\/a>//g;"
|0.25Banana|0.10
Grape|0.05
Shouldn't the regex match on the </span></div></a>
tags, but not on the </span></div></div>
tags?
Thanks, again, for your time and assistance.