Regex to remove everything from div

Question

html = 
<div>
<p style="color: #555555; margin-top:32px;">
    Sent
   <span>
    by
    <a style="text-decoration:none; color: #875A7B;" href="http://www.example.com">
    <span>YourCompany</span>
   </a>

</span>
    using
  <a target="_blank" href="https://www.odoo.com?utm_source=db&amp;utm_medium=email" 
        style="text-decoration:none; color: #875A7B;">Odoo</a>.
      </p>

I have this regex expression

html = re.sub(
            'using' + "(.*)[\r\n]*(.*)>" + 'Odoo' + r"</a>", "", html,
        )

and I get result

html =
<div> 
<p style="color: #555555; margin-top:32px;">
Sent
<span>
by
<a style="text-decoration:none; color: #875A7B;" href="http://www.example.com">
    <span>YourCompany</span>
</a>

</span>
.
</p>
  </div>

but how can I update my regex to remove all from <p tag. Basicaly i need empty <div here but only if <p tag includes words Sent and by

Consider using a parser (e.g. `BeautifulSoup`) and `xpath` queries instead. — Jan, May 06 '21 at 12:44
[One doesn't simply handle HTML with regex](https://stackoverflow.com/a/1732454/770830). — bereal, May 06 '21 at 12:57

Hajny · Answer 1 · 2021-05-06T12:55:30.253

1

UPDATE
With the same pattern, you could first check if <p> contains "Sent" and "by".

pattern = re.compile("<p(.*\n)*.*</p>")
p = re.search(pattern, html).group(0)

if "Sent" in p and "by" in p:
    html = re.sub(pattern, "", html)

Old answer

This should work: html = re.sub("<p(.*\n)*.*<\/p>", "", html).

edited May 06 '21 at 12:55

answered May 06 '21 at 12:24

Hajny

79
1
4

I updated my question, there is one more clause – Chaban33 May 06 '21 at 12:32
actually your code give this error TypeError: argument of type 're.Match' is not iterable – Chaban33 May 07 '21 at 06:38
@Chaban33 I could not reproduce this error. Also, I am not iterating over any `Match` object, so I don't think the error comes from my code. You can update your question with some more information about the error and where it occurred, so that I can look over it. – Hajny May 07 '21 at 09:08

Regex to remove everything from div

1 Answers1

Old answer