0

Trying to replace chunks of text in html files with 'xxx' using re.sub, python 2.7. I can only get it to work with basic strings that have no spaces or new lines. This code finds nothing to replace. I've tried DOTALL, and other things, but nothing works. It just prints the whole file. I've successfully used re.search, but this won't work.

CODE:

print re.sub(r'table\sstyle\=(.+)script', r'xxx', text, re.S)

IS SEARCHING (text):

<table style="background-color: #ecddb0">
<tbody>
<TR>
<TD>
<style type="text/css">
body {
background-color: #ffffff;
margin: 0px;
padding: 0px 0 0 0px;
</style>
<script type="text/javascript
ThiefMaster
  • 310,957
  • 84
  • 592
  • 636
  • 2
    obligatory link: http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454 - if you want to sanitize stuff that's the way to go. – ThiefMaster Nov 06 '12 at 07:18
  • What @ThiefMaster said! Also, `(.+?)` maybe. – Nadh Nov 06 '12 at 07:27

1 Answers1

4

The fourth argument of re.sub is count. You want to set flags:

re.sub(r'table\sstyle\=(.+)script', r'xxx', text, flags=re.S)
Janne Karila
  • 24,266
  • 6
  • 53
  • 94