Python re and regex not matching my patterns

Question

I have an XML that I need to edit (1k+ lines) and the changes involve multiple lines in some cases. Then, my solution until this moment has been working the xml as a file opened as a string.

The line I'm looking to change is:

<p class=\"DailyReadingImage\" field=\"heading\"><img src=\"../Final/Images/$1.$2 Spanish.png\" style=\"real-height: 0.25in; real-width: 0.25in; width: 50%\" /></p>

To only be <p />.

According to regex101 my pattern for looking and replacing should be:

<p( class=\"[^\"]+\")??( style=\"[^\"]+\")??.+<img[^>]+><.p>

This is the current version of the function:

import re

def repLine(k, f, r): # Key, File (a string at this point), Replacement
    fi = re.sub(re.compile(k), r, fi)
    return fi

But when I try print(repLine(k, f, r)) it doesn't return the string with those changes.

Trying to print matches in order to investigate only returns [('',''), ('',''), ('','')] (since there's three of these lines I need to replace). I've tried importing re and regex as re with no positive results. Any suggestions?

NOTE: Regex were in PCRE2, I changed them to Python format.

I've tried fi = fi.replace(k, r)
I've also tried creating a bucket with re.findall() and loop through it to replace.
Tried going line by line but some lines require multiline search ('\n').

Please [edit](https://stackoverflow.com/posts/75687932/edit) your question to include a [minimum reproducible example](https://stackoverflow.com/help/minimal-reproducible-example) with test data demonstrating your problem. — Woodford, Mar 09 '23 at 17:20
That said, don't use regular expressions to parse HTML. Use an HTML parser. — chepner, Mar 09 '23 at 17:22
What is it you are looking to do conceptually? can you provide examples of the text you start with and what you hope to have as an end result? — JonSG, Mar 09 '23 at 17:24
Please clarify your specific problem or provide additional details to highlight exactly what you need. As it's currently written, it's hard to tell exactly what you're asking. — Community, Mar 09 '23 at 18:34
Does this answer your question? [RegEx match open tags except XHTML self-contained tags](https://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags) — ti7, Mar 09 '23 at 19:29
To be completely honest, no. I get the idea that using a parser is better. But what happens when I want to replace elements and their content? — PonchoIMa, Mar 09 '23 at 19:42
Hi. Your regex is actually this `]+><.p>` https://regex101.com/r/hAXjPj/1 If there are certain **p** attributes and certain **img** attributes that will qualify the span of ` to ` you should mention that so I can write you a regex that replaces the block with `
` — sln, Mar 09 '23 at 19:55

Python re and regex not matching my patterns

0 Answers0