0

How can I parse [u][i][b]sometext[/i][/u] into <u><i>[b]sometext</i></u>?

I have some sort of markup which I need to convert into tags. Regex works good until tags can be nested. Is there any library for this in python/django?

Paul R
  • 2,631
  • 3
  • 38
  • 72
  • So I guess your markup is not a standard (like markdown) ? Do you read the markup from a file or do get it from a POST request? If it is from sting both regex and built in string operations are viable. – Maarten Jun 15 '17 at 20:47
  • It is a string. – Paul R Jun 15 '17 at 20:49
  • in that case you will be looking at the built in functionality. if that does not satisfy you needs maybe you could post some more code (include the python version you are using as well). for reference on string operations you can consult the Python documentation. – Maarten Jun 15 '17 at 20:56
  • Nope. `re`'s good enough. – cs95 Jun 15 '17 at 21:17
  • 1
    "Regex works good until tags can be nested". You put your finger on it: Nesting is exactly what regular expressions **cannot** handle. There are work-arounds, and solid solutions for particular simple cases, but the real solution is to switch to technology that can handle nesting. [This famous answer](https://stackoverflow.com/a/1732454/699305) demonstrates that this is a well-known problem. And people, um, sometimes have strong feelings about it. – alexis Jun 15 '17 at 21:25

1 Answers1

1

Here's an approach that takes advantage of the callback mechanism available in re.sub. The intuition is to follow a recursive approach when substituting.

Tested on python2.7 and python3.4

import re

s = ... # your text here

def replace(m):
    if m:
        return '<' + m.group(1) + '>' + re.sub(r"\[(.*?)\](.*?)\[/\1\]", replace, m.group(2), re.DOTALL) + '</' + m.group(1) + '>'
    else:
        return ''

s = re.sub(r"\[(.*?)\](.*?)\[/\1\]", replace, s, re.DOTALL)
print(s)

Case (1)

[u][i][b]sometext[/i][/u]

Output

<u><i>[b]sometext</i></u>

Case (2)

[u][i][b]sometext[/b][/i][/u]

Output

<u><i><b>sometext</b></i></u>

These are the only two cases I've tried it on, but it should work for most usual cases.

cs95
  • 379,657
  • 97
  • 704
  • 746