0

I have a simple regex that will not do what I want. I went here and tested the regex and it works:

https://regex101.com

it's just not working in Python. Why? Thanks in advance.

string =

covid sucks and I want to go outside <!--/* Font Definitions */@font-face{font-family:Wingdings;panose-1:5 0 0 0 0 0 0 0 0 0;}@font- 
face{font-family:""Cambria Math"";panose-1:2 4 5 3 5 4 6 3 2 4;}@font-face{font-family:Calibri;panose- 
1:2 15 5 2 2 2 4 3 2 4;}@font-face{font-family:""Bradley Hand ITC"";panose-1:3 7 4 2 5 3 2 3 2 3;}/* 
Style Definitions */p.MsoNormal, li.MsoNormal, div.MsoNormal{margin:0in;margin-bottom:.0001pt;font- 
size:11.0pt;font-family:""Calibri"",sans-serif;}p.MsoListParagraph, li.MsoListParagraph, 
div.MsoListParagraph{m{margin-bottom:0in;}--> pop goes the peanut.

desired output = 'covid sucks and I want to go outside pop goes the peanut.'

I want everything between the < > to go away including the < >. Also, string is part of a much larger string. Sometimes the <...> is buried in the middle of a larger string. I need to be able to find it wherever it may be in the larger string and delete it.

My attempts:

string.replace("<.*(?=>)", " ")

and

string.replace("<.*>", " ")
Barmar
  • 741,623
  • 53
  • 500
  • 612
wolf7687
  • 135
  • 8
  • You must state the *rule* for determining a match. Is it, for example, the string that follows `"--> "` and precedes a `"."` at the end of the string? If so, try the regular expression `(?<=--> )[a-z ]+(?=\.$)` (with the multiline flag not set).[Demo](https://regex101.com/r/K5XaAB/1/). – Cary Swoveland Apr 15 '20 at 18:35
  • Thanks Cary. I updated my post. – wolf7687 Apr 15 '20 at 18:39
  • In that case, `<.*> *` with the *single-line* flag set, causing `.` to match newlines as well as any other character. [Demo](https://regex101.com/r/K5XaAB/2/). ` *` at the end causes any trailing spaces to be consumed as well, so you are not left with `" pop goes the peanut."` – Cary Swoveland Apr 15 '20 at 18:44
  • Maybe this helps as another example: https://regex101.com/r/hvFgib/1 – MDR Apr 15 '20 at 18:53
  • You have to use `re.sub()` to replace a regexp. `string.replace()` just replaces fixed strings. – Barmar Apr 15 '20 at 18:54
  • There's no point in providing a general link to `regex101.com`. If you want to give a link to your regex and test string at regex101.com you need to click the three horizontal bars to the left of "regular expressions 101", select the regex engine (Python) and then save to a custom URL which is your link. – Cary Swoveland Apr 15 '20 at 18:55
  • `re.sub("<.*>", " ", string)` – Barmar Apr 15 '20 at 18:56
  • Thanks Barmar. That worked. – wolf7687 Apr 15 '20 at 19:01
  • "I want everything between the < > to go away" is more accurately expressed "I wish to convert everything between < and > to an empty string". You say you want to be left with `"...to go outside pop goes the..."` but deleting everything between < and > leaves two spaces between "outside" and "pop". What Barmar suggest leaves you with three spaces between "outside" and "pop". Is that what you want? That may seem like a detail but you need to be precise about your objective. – Cary Swoveland Apr 15 '20 at 19:06
  • @CarySwoveland is right, it's important to be precise and specific. Also, that `string` variable you provided is not valid Python, can you share a correct code snippet which defines the test string? – AMC Apr 15 '20 at 19:19

0 Answers0