0

I have an XML that I need to edit (1k+ lines) and the changes involve multiple lines in some cases. Then, my solution until this moment has been working the xml as a file opened as a string.

The line I'm looking to change is:

<p class=\"DailyReadingImage\" field=\"heading\"><img src=\"../Final/Images/$1.$2 Spanish.png\" style=\"real-height: 0.25in; real-width: 0.25in; width: 50%\" /></p>

To only be <p />.

According to regex101 my pattern for looking and replacing should be:

<p( class=\"[^\"]+\")??( style=\"[^\"]+\")??.+<img[^>]+><.p>

This is the current version of the function:

import re

def repLine(k, f, r): # Key, File (a string at this point), Replacement
    fi = re.sub(re.compile(k), r, fi)
    return fi

But when I try print(repLine(k, f, r)) it doesn't return the string with those changes.

Trying to print matches in order to investigate only returns [('',''), ('',''), ('','')] (since there's three of these lines I need to replace). I've tried importing re and regex as re with no positive results. Any suggestions?

NOTE: Regex were in PCRE2, I changed them to Python format.

  1. I've tried fi = fi.replace(k, r)
  2. I've also tried creating a bucket with re.findall() and loop through it to replace.
  3. Tried going line by line but some lines require multiline search ('\n').
PonchoIMa
  • 1
  • 1
  • 1
    Please [edit](https://stackoverflow.com/posts/75687932/edit) your question to include a [minimum reproducible example](https://stackoverflow.com/help/minimal-reproducible-example) with test data demonstrating your problem. – Woodford Mar 09 '23 at 17:20
  • 8
    That said, don't use regular expressions to parse HTML. Use an HTML parser. – chepner Mar 09 '23 at 17:22
  • 1
    What is it you are looking to do conceptually? can you provide examples of the text you start with and what you hope to have as an end result? – JonSG Mar 09 '23 at 17:24
  • Please clarify your specific problem or provide additional details to highlight exactly what you need. As it's currently written, it's hard to tell exactly what you're asking. – Community Mar 09 '23 at 18:34
  • 1
    Does this answer your question? [RegEx match open tags except XHTML self-contained tags](https://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags) – ti7 Mar 09 '23 at 19:29
  • To be completely honest, no. I get the idea that using a parser is better. But what happens when I want to replace elements and their content? – PonchoIMa Mar 09 '23 at 19:42
  • Hi. Your regex is actually this `]+><.p>` https://regex101.com/r/hAXjPj/1 If there are certain **p** attributes and certain **img** attributes that will qualify the span of ` to ` you should mention that so I can write you a regex that replaces the block with `

    `
    – sln Mar 09 '23 at 19:55

0 Answers0