I have a document like this:
TEXT
TEXT
<ul>
<li>1</li>
<ul>
<li>2</li>
<li>3</li>
</ul>
<li>4</li>
</ul>
ANOTHER TEXT
What can I use to transform it into:
TEXT
TEXT
* 1
** 2
** 3
* 4
ANOTHER TEXT
I need to parse the ul/li parts only, TEXT (it doesn't have ul/li) should be left intact without any changes.
I wrote a parser
def uls(str):
str = re.sub(r'<li>(.*?)</li>', r"<li><!!\1></li>", str, flags=re.M | re.U | re.MULTILINE | re.DOTALL)
ret_text = []
ul_level = 0
text = ''
pattern = re.compile(r'(<.*?>)')
for tag in re.findall(pattern, str):
if tag == '<ul>':
ul_level += 1
if tag == '</ul>':
ul_level -= 1
if ul_level == 0:
ret_text.append(text)
text = ''
if re.search(r'<!!(.*?)>', tag, re.M | re.U | re.MULTILINE | re.DOTALL):
text = text + ('*' * ul_level) + re.sub(r'<!!(.*?)>', r' \1\n', tag, re.M | re.U | re.MULTILINE | re.DOTALL)
return ret_text
It's produces correct array, but how can I replace
- ...