0

The input are many files with little differences, eg. h3 can be h2, or can be two
at ending - so want to use (ver1|ver2|ver3), but want to replace only part of match.

regex (which doesn't work)

filedata = re.sub(r"""
  \uF0B7<br/>\n                          # Ě<br/> this means punctation mark
  (?P<txt>.*?)                           # this is text
  (?P<end>(?:<br/>)\n|\n<h)              # this is versions of endings
""",'\n<li>\g<txt></li>\g<end>',filedata, flags=re.S|re.VERBOSE)

input:

(...)
\uF0B7<br/>
Something1<br/>
\uF0B7<br/>
Something2<br/>
\uF0B7<br/>
Something3
<h3>Next Topic
(...)

unfortunately - (?:<br/>) doesn't work - <br/> is in \g<end>

result:

  (...)
  <li>Something1</li><br/>
  <li>Something2</li><br/>
  <li>Something3</li>
  <h3>Next Topic
  (...)

expected result:

  (...)
  <li>Something1</li>
  <li>Something2</li>
  <li>Something3</li>
  <h3>Next Topic
  (...)

(I know, that <li> requires <ul> or <dl>, but this is in other regex)

Tomasz Brzezina
  • 1,452
  • 5
  • 21
  • 44
  • 2
    Wait a minute. I've been told that you [shouldn't use regex to parse html](http://stackoverflow.com/a/1732454/5827958). – zondo Mar 26 '16 at 00:23
  • Have to, because the source isn't html - i'm creating html from formated string -
    in source is from previous step
    – Tomasz Brzezina Mar 26 '16 at 00:28

0 Answers0