Remove string in between two other strings

Question

I have a string that I need to remove the characters in the string between two other strings.

At the moment I have the following code, I'm not too sure why it doesn't work.

def removeYoutube(itemDescription):
    itemDescription = re.sub('<iframe>.*</iframe>','',desc,flags=re.DOTALL)
    return itemDescription

It doesn't remove the string in between and including and .

Example Input (String):

"<div style="text-align: center;"><iframe allowfullscreen="frameborder=0" height="350" src="https://www.youtube.com/embed/EKaUJExxmEA" width="650"></iframe></div>"

Expected Output: <div style="text-align: center;"></div>

As you can see from the output it should remove all of the parts containing <iframe></iframe>.

In general you get better answers if you provide sample input and the expected output as it reduced ambiguity. — ScootCork, Feb 14 '21 at 14:10

score 1 · Accepted Answer · answered Feb 14 '21 at 14:52

1

Use BeautifulSoup not regex, as regex is a poor choice for parsing a HTML. Here's why.

Here's how:

from bs4 import BeautifulSoup

sample = """
<div style="text-align: center;"><iframe allowfullscreen="frameborder=0" height="350" src="https://www.youtube.com/embed/EKaUJExxmEA" width="650"></iframe></div>
"""

s = BeautifulSoup(sample, "html.parser")

for tag in s.find_all(True):
    if tag.name == "iframe":
        tag.extract()
print(s)

Output:

<div style="text-align: center;"></div>

answered Feb 14 '21 at 14:52

baduker

19,152
9
33
56

1

thanks for the answer I don't know why I didn't think of that and thanks for linking the page as to why. Will be using this more in the future than using regex. Much appreciated :) – Morgan Feb 14 '21 at 15:01

Remove string in between two other strings

1 Answers1