-1

I am dealing a problem to write a python regex 'not'to identify a certain pattern within href tags.

My aim is to replace all occurrences of DSS[a-z]{2}[0-9]{2} with a href link as shown below,but without replacing the same pattern occurring inside href tags

Present Regex:

replaced = re.sub("[^http://*/s](DSS[a-z]{2}[0-9]{2})", "<a href=\"http://test.com=\\1\">\\1</a>", input)

I need to add this new regex using an OR operator to the existing one I have

EDIT:

I am trying to use regex just for a simple operation. I want to replace the occurrences of the pattern anywhere in the html using a regex except occurring within<a><\a>.

c_prog_90
  • 951
  • 20
  • 43
  • possible duplicate of [Python Find & Replace Beautiful Soup](http://stackoverflow.com/questions/6674310/python-find-replace-beautiful-soup) –  Jul 13 '11 at 15:37
  • What exactly are you trying to accomplish with `[^http://*/s]`? This isn't making any sense. – Tim Pietzcker Jul 13 '11 at 15:41
  • I am trying not to match the pattern when it is inside a http:// link – c_prog_90 Jul 13 '11 at 15:54
  • @thinkcool: Regexes cannot reliably do this, even if you think it's a simple operation. People won't tell you how to do it with regex, because regex is not the right tool for the job. It gets asked again and again, which is why e-satis linked a standard answer. If you're handling HTML, use an HTML parser. – Thomas K Jul 13 '11 at 16:53

1 Answers1

3

The answer to any question having regexp and HTML in the same sentence is here.

In Python, the best HTML parser is indeed Beautilf Soup.

If you want to persist with regexp, you can try a negative lookbehind to avoid anything precessed by a ". At your own risk.

Community
  • 1
  • 1
Bite code
  • 578,959
  • 113
  • 301
  • 329