How to make a regex part "backwards lazy"?

Question

Apologies for the terrible title, i don't know how to express this in short form.

I have a string containing html (text, some spans and some <br>s). What i'm trying to achieve is to find the first span with a class ending "-focused". For added fun, the spans have line returns in the title attribute. However they do have a fixed structure and i can rearrange them if needed. This is what i have so far:

 <span[\s\S]*?class=".*-focused"[\s\S]*?>[\s\S]*?<\/span>

But i get a match from the start of the first span to the end of the matching span. Here's a regex101 link to illustrate (contains example text)

https://regex101.com/r/W7YDU5/2

I tried playing with positive/negative lookaheads and capturing/non-capturing groups, but i'm more confused than anything at this point.

Obligatory https://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags — epascarello, Jul 27 '17 at 13:38
Always the same. HTML is not a context-free language, that's true, so trying to parse an entire HTML file using regex may not work. However, to say you must not use regexes as soon as any HTML is involved is simply nuts in my opinion. There are cases where regexes are a suitable tool even when it comes to searches within HTML — Psi, Jul 27 '17 at 13:47
Possible duplicate of [regular expression greedy on left side only (.net)](https://stackoverflow.com/questions/12186389/regular-expression-greedy-on-left-side-only-net) — Sebastian Proske, Jul 27 '17 at 13:53
@Amy and epascarello, that's why i specified i don't have arbitrary tags in the text (maybe i should have made that clearer, sure). If this really hurts you, replace " with "thing2 — iCart, Jul 27 '17 at 13:56
@iCart it has nothing at all to do with arbitrary tags. HTML is not a regular language, and using regex on HTML is going to lead you to a bad time. You are of course free to ignore this advice, but we aren't the ones who will be hurt by it. — , Jul 27 '17 at 14:06
@Amy As proven, this _is_ solveable although we're dealing with HTML here. Maybe you should differ between parsing an entire not-context-free grammar (regular language) and parsing specific snippets of text that _may_ derive from such a grammar but are very well parseable. — Psi, Aug 02 '17 at 21:59
@psi You are free to use the wrong tool for the job. Good luck to you. — , Aug 03 '17 at 00:02
It is the wrong tool when it comes to parsing entire documents or fragments with nested elements _only_. A tool can't be wrong if it does the job performantly and reliably ;) — Psi, Aug 03 '17 at 05:39

score 1 · Accepted Answer · answered Jul 27 '17 at 13:51

1

You should avoid using the first [\s\S] here. To get what you need, you may want to proceed within the same opening tag. That is implicitly done when matching everything except >:

<span[^>]*?class=".*-focused"[^>]*?>[\s\S]*?<\/span>

answered Jul 27 '17 at 13:51

Psi

6,387
3
16
26

score -1 · Answer 2 · answered Jul 27 '17 at 13:45

-1

document.querySelector('span[class=$-focused]')

this should find the first span with a class ending with -focused

answered Jul 27 '17 at 13:45

sheplu

2,937
3
24
21

What if the op wants to parse the html source and is not operating within the browser at all? – Psi Jul 27 '17 at 13:49

How to make a regex part "backwards lazy"?

2 Answers2