Regex Full match

Question

I'm trying to understand regular expressions:

I need to only match on text_01 and text_02 and filter out the tags.

<span>text_01<b>text_02</b>

I've tried to do it like:

(?<=<span>)(([^>]+)<b>)(.+?)(?=</b>)

But it captures 3 groups and and the Full Match includes a tag.

text_01<b>text_02

Could you give me advice on how I need to build a regex whose Full match contains only text and no tags?

Luciano van der Veekens · Answer 1 · 2017-07-08T18:32:45.790

0

By using a non-capturing group you are able to exclude the middle <b> tag as a capture group, but you will never be able to get a full match without the tag included. It's not possible, a regular expression cannot skip a part while capturing. A match must be consecutive.

(?<=<span>)(.+?)(?:<b>)(.+?)(?=<\/b>)

Full match text_01<b>text_02
Group 1. text_01
Group 2. text_02

edited Jul 08 '17 at 18:32

answered Jul 08 '17 at 18:25

Luciano van der Veekens

6,307
4
26
30

trincot · Accepted Answer · 2017-07-08T18:31:52.583

0

Parsing HTML with regular expressions can get very complicated. In general it is not advised practice and better to use a parser for this (some library in whatever language you are using).

But for cases where you are sure the text content does not have < nor >, and these < and > are not nested, you could use this one:

[^<>]*(?=<[^<>]*>)

This only matches text that is followed by a pair of < and >.

If it is enough to test that text is followed by <, it can be simply:

[^<>]*(?=<)

edited Jul 08 '17 at 18:31

answered Jul 08 '17 at 18:26

trincot

317,000
35
244
286

Thank you. This is exactly what I need. You are awesome. – recont Jul 08 '17 at 20:56

Regex Full match

2 Answers2