2

I want to replace all the <span...> (including <span id="... and <span class="...) in an html by <span> except if the span starts by <span id="textmarker (for example I don't want to keep this span : <span attr="blah" id="textmarker">)

I've tried the regex proposed here and here, I finally came up with this regex that never returns a <span id="textmarker but somehow it sometimes misses the other spans:

<span(?!.*? id="textmarker).*?">

You can see my (simplified) html here : https://regex101.com/r/yT9jG2/2

Strangely, if I run the regex in notepad++ it returns 3 matches (the three spans in the second paragraph) but regex101 only returns 1 match. Notepad++ and regex101 both miss the span in the first paragraph.

This regex also doesn't return every spans it should( cf the spans with a gray highlights here

<span(?![^>]*? id="textmarker)[^>]*?>
Community
  • 1
  • 1
MagTun
  • 5,619
  • 5
  • 63
  • 104

1 Answers1

2

Updated: To exclude id="textmarker while including id="anythingelse and all other spans:

(<span(?! *id="textmarker)[^>]*>)

On your posted example at: https://regex101.com/r/yT9jG2/2 , and at the top, choosing version 2, set the fields so:

  • field 1: (<span(?! *id="textmarker)[^>]*>)
  • field 2, (the smaller field that lets you set modifier): g

With your example and choosing version 2, matches 9 and lists them on the right, including empty spans as well as non-id="textmarker such as <span id="YellowType">

Explanation

Field 1:

  • optional: ( and ). An extra outer parenthesis was added to the expression for educational purposes, just for making use of regex101's matched group listing feature to list results on the right pane in addition to the default inline highlighting of matches. When using Notepad++ you can of course omit these outer ( ) parentheses.
  • <span: matches <span
  • (?! starts a negative lookahead assertion for the following,
  • * meaning space zero or more times, in case you have extra spaces
  • followed by id="textmarker
  • ) to end the negative lookahead assertion
  • so if the match sees the negative lookahead assertion it automatically discards that as a match
  • [^ starts an exclusion set. so not of of the following, the following being the >
  • ] to stop defining the exclusion
  • * to match the preceding 0 or more times. The preceding being [^>]
  • > to match to end of the open-a-span tag

Field 2

  • g tells regex101 you want this to be a greedy match
  • so the result does not stop at the first match, but will have all matches
clarity123
  • 1,956
  • 10
  • 16
  • Good try but unfortunately the regex should also match all the other `span id` than ` – MagTun Jan 15 '16 at 18:52
  • This detects `i` so it should work to exclude any span id, yes please update the question with more info if not working – clarity123 Jan 15 '16 at 19:05
  • I'd rather use `\s` instead of space and the "educational purposes" confused before reading your explanation :) extensive answer. – bobble bubble Jan 15 '16 at 20:24