Hi I am trying to make a code that deletes the angle brackets whenever the word 'div' appears between the angle brackets.
<div class="ipc-page-content-container ipc-page-content-container--center" role="presentation"><a class="ipc-button ipc-button--double-padding ipc-button--default-height ipc-button--core-baseAlt ipc-button--theme-baseAlt ipc-button imdb-footer__open-in-app-button" href="/whitelist-offsite?url=https%3A%2F%2Ftqp-4.tlnk.io%2Fserve%3Faction%3Dclick%26campaign_id_android%3D427112%26campaign_id_ios%3D427111%26destination_id_android%3D464200%26destination_id_ios%3D464199%26my_campaign%3Dmdot%2520sitewide%2520footer%2520%26my_site%3Dm.imdb.com%26publisher_id%3D350552%26site_id_android%3D133429%26site_id_ios%3D133428&page-action=ft-gettheapp&ref=ft_apps" tabindex="0"><div class="ipc-button__text">Get the IMDb App</div></a></div></div><div class="ipc-page-content-container ipc-page-content-container--center _2AR8CsLqQAMCT1_Q7eidSY" role="presentation">
For example, the <div class="ipc-page-content-container ipc-page-content-container--center" role="presentation">
would just become div class="ipc-page-content-container ipc-page-content-container--center" role="presentation"
when using this code.
I tried to use regular expression to find div in the text, but I can't seem to find a way to delete the angle brackets.
import re
with open("movie.text.txt", 'rt', encoding='UTF8') as myfile:
text = myfile.read()
regex = "<div .+>"
text = re.sub(regex, "div .+", text)
This code seems to delete every line of the text, and just replace it with div .+
Does anyone know how to make this code function properly?