1

The following regex matches the first instance of 'STACK' found in any string that begins with HTML comment tags

(?:\<\!\-\-).*?(\bSTACK\b).*?(?=\>)

So, for example, at the moment only the first 'STACK' in the string below is matched, and I want both to be matched.

<!-- test test STACK test test STACK -->

First question: what do I need to change so this will match all instances (with and without the word boundary)?

Further Difficulty

The title of this question refers to replacing all instances in specific sections of a HTML document / ASP.Net page

An example of a document that I wish to use Regex.Replace (c#) with might look like:

<asp:Content>
    <div>some text STACK some text STACK123</div>
</asp:Content>
<asp:Content>
    <!-- SOLR {"description":"some text<br/> <b>bold stuff</b>", "summary":"some text <b>STACK123/<b> some text STACK"} SOLR ---> 
</asp:Content>

I already have another regex which works for the ASP.net / HTML elements (in c#, not in php/javascript etc):

(?<!<[^>]*?)bSTACK

This matches all instances of 'STACK' that are inside HTML elements / ASP.Net tags.

I was unable to find a way to extend this / combine this with the first regex above to work inside comment tags too, so I'm currently planning to use two separate Regex.Replaces to achieve what I want, unless...

Second question: does one of you geniuses know a way of achieving what I want in a single line of regex...?

Further Examples - Example & Expected Results

<asp:Content>
<div>some text STACK some text STACK123</div>
</asp:Content>
<asp:Content>
<!-- SOLR {"description":"some text", "summary":"some text STACK123 some text STACK"} SOLR ---> 
</asp:Content>

Should match:

<asp:Content>
<div>some textSTACKsome textSTACK123</div>
</asp:Content>
<asp:Content>
<!-- SOLR {"description":"some text", "summary":"some textSTACK123 some textSTACK"} SOLR --->
</asp:Content>

abatishchev
  • 98,240
  • 88
  • 296
  • 433
jag
  • 387
  • 2
  • 14
  • 1
    Your `(?:\<\!\-\-).*?(\bSTACK\b).*?(?=\>)` regex will match two or more consecutive comment tags if the word `STACK` appears in the second/third/etc. one. Do not use `.*` or `.*?` when you work with mark up text. Do not rely on regex if you have to parse HTML. – Wiktor Stribiżew Dec 13 '16 at 19:25
  • 2
    The real geniuses use html parser. – Alexander Petrov Dec 13 '16 at 19:25
  • @WiktorStribiżew I have made a small edit to the question - the string I'm looking for might appear multiple times in the same comment tag. AlexanderPetrov and Wiktor - I'm aware of alternative approaches - I'd like to know if this is possible with regex at all. – jag Dec 13 '16 at 19:28
  • With regex, it is possible, but with usual in these cases assumptions/disclaimers. It means it will work in some cases, maybe in 99.9% cases, but there is no guarantee it will work in 100% cases. – Wiktor Stribiżew Dec 13 '16 at 19:31
  • I am not sure I understand the results you are looking for. Can you please post some examples together with the expected results? – JuanR Dec 13 '16 at 19:36
  • @WiktorStribiżew - in this particular circumstance, 99.9% of cases is acceptable. – jag Dec 13 '16 at 19:36
  • @juan - yes I will update the question – jag Dec 13 '16 at 19:36
  • See [`(?s)(?<=).)*)STACK`](http://regexstorm.net/tester?p=%28%3f%3c%3d%3c!--%5cs%28%3f%3a%28%3f!%3c!--%5cs%7c%5cs--%3e%29.%29*%29STACK&i=%3c!--+test+test+STACK+test+test+STACK+--%3e%0d%0a%3casp%3aContent%3e%0d%0a++++%3cdiv%3esome+text+STACK+some+text+STACK123%3c%2fdiv%3e%0d%0a%3c%2fasp%3aContent%3e%0d%0a%3casp%3aContent%3e%0d%0a++++%3c!--+SOLR+%7b%22description%22%3a%22some+text%3cbr%2f%3e+%3cb%3ebold+stuff%3c%2fb%3e%22%2c+%22summary%22%3a%22some+text+%3cb%3eSTACK123%2f%3cb%3e+some+text+STACK%22%7d+SOLR+---%3e+%0d%0a%3c%2fasp%3aContent%3e&o=s). – Wiktor Stribiżew Dec 13 '16 at 19:45
  • @juan - I've added an example. – jag Dec 13 '16 at 19:48
  • Thank you @jag. This may seem dumb but... how about a simple string.Replace()? It looks to me like you are replacing all instances anyways... – JuanR Dec 13 '16 at 19:49
  • How about do not try to handle HTML with RegEx? – abatishchev Dec 13 '16 at 19:52
  • [`(?<=(?:).)*|>[^<]*))STACK`](http://regexstorm.net/tester?p=%28%3f%3c%3d%28%3f%3a%3c!--%5cs%28%3f%3a%28%3f!%3c!--%5cs%7c%5cs--%3e%29.%29*%7c%3e%5b%5e%3c%5d*%29%29STACK&i=%3c!--+test+test+STACK+test+test+STACK+--%3e%0d%0a%3casp%3aContent%3e%0d%0a++++%3cdiv+text%3d%22STACK%22%3esome+text+STACK+some+text+STACK123%3c%2fdiv%3e%0d%0a%3c%2fasp%3aContent%3e%0d%0a%3casp%3aContent%3e%0d%0a++++%3c!--+SOLR+%7b+%3cb%3ebold+stuff%3c%2fb%3e%22%2c+%22summary%22%3a%22some+text+%3cb%3eSTACK123%2f%3cb%3e+some+text+STACK%22%7d+SOLR+---%3e+%0d%0a%3c%2fasp%3aContent%3e&o=s) – Wiktor Stribiżew Dec 13 '16 at 20:48
  • Sounds like a classic "two problems" instance. Even if you don't want to go as far as an HTML parser, you could just `.IndexOf()` for "", then replace all `STACK` in between (with a regex if you want), loop. This is likely to be faster as well. The problem with genius regexes, even when you get them to work, is that you won't be able to understand and modify them later, so this is only really worth it if a regex is your only option (that is, you simply can't write code, you must plug in a regex that will be used by other code somewhere). – Jeroen Mostert Dec 14 '16 at 09:06
  • Thanks for all responses - the regex @WiktorStribiżew posted was the response I was looking for (as it achieved what I asked for), but as it's not an answer as such I can't mark it as the answer. Wiktor - it says you've marked this as an exact duplicate? I'm not sure why (see the accepted answer on the duplicate you posted - it's... not exactly helpfully formatted) – jag Dec 14 '16 at 21:25
  • See the answer below. It is "correct". It means that is the correct way to handle HTML. If I post the answer, it is likely to get downvoted because of that very post I used to close the question. – Wiktor Stribiżew Dec 14 '16 at 21:29

0 Answers0