-3

View on Live regex101

My regular expression pattern is

[Nn]issan(?=[^<>]*<)(?!(?:(?!</?(?:a|span)[ >/])(?:.|\n))*</(?:a|span)>)

I want to stop capture url inside nissan word.view following screenshot.

I use re.sub(pattern, new_word, paragraph, flags=re.U|re.M) function for replace this Nissan word to new_word.

enter image description here

Jaykumar Patel
  • 26,836
  • 12
  • 74
  • 76
  • 3
    [There was something about this...](http://stackoverflow.com/a/1732454/1763356) – Veedrac Sep 29 '14 at 06:37
  • did you want to match `nissan` in this anchor tag `` – Avinash Raj Sep 29 '14 at 06:40
  • no i don't match href url link. i want to match only Nissan word that word not between **`Nissan or Nissan`**. – Jaykumar Patel Sep 29 '14 at 06:42
  • 2
    @Veedrac is correct. Even worse is you're trying to use regular expressions on HTML in python, when python has the absolutely amazing beautiful soup library. It would be trivial to make soup of this html then remove all anchor and span tags. Then return any matches in the remains. – Stack of Pancakes Sep 29 '14 at 06:46
  • 1
    Don't use regex to parse HTML. Don't. DON'T. (sigh) – Jim Garrison Sep 29 '14 at 07:12
  • But how to i solve my problem? If i'm not use regex to parse HTML then what is another way? I have paragraph. and when this paragraph is insert into database before i have another table stored word (word, replace_word, url) are replace this paragraph text if found. and after store but paragraph is modified someone again repeat same thing so i am ignore or tag between word. – Jaykumar Patel Sep 29 '14 at 07:17
  • The other way is to use an XML or HTML parser to parse HTML. – Veedrac Sep 29 '14 at 08:42

2 Answers2

1

You can try this pattern:

[Nn]issan(?=[^<>]*<)(?!(?:(?!</?(?:a|span)[ >/])(?:.|\n))*</(?:a|span)>)

It has one flaw that I'm aware of, which is that nested <a> or <span> tags can trip it up, causing it to match stuff like this:

<a>nissan<span></span><a>

See demo.

Explanation:

[Nn]issan
(?= # make sure it's not inside an <a> or <span> tag, like <a href="nissan">
    # to do that, we'll assert that the next "<" occurs before ">".
    [^<>]*
    <
)
(?! # next, make sure it's not enclosed in an <a> or <span> tag like <a>nissan</a>
    # to do that, we'll match anything up to the next "a" or "span" tag, either opening or closing, and then assert the tag is opening.
    (?: # while...
        (?! #...there is no opening or closing "a" or "span" tag
            <
            /?
            (?:
                a|span
            )
            [ >]
        )
        (?: # consume the next character.
            .|\n
        )
    )*
    # then assert the tag is not closing.
    </
    (?:
        a|span
    )
    >
)
Aran-Fey
  • 39,665
  • 11
  • 104
  • 149
-1
Nissan(?!((?!<\/a>).)*<\/a>|((?!<\/span>).)*<\/span>)

Try this.See demo.

http://regex101.com/r/dN8sA5/2

vks
  • 67,027
  • 10
  • 91
  • 124