1

I have a <textarea> where the user enters his text. The text can contain special chars which I need to parse and replace with HTML tags for display purposes.
For example:
Bolded text will be entered as: *some text* and parsed to: <strong>some text</strong>.
URL will be entered as: #some text | to/url# and parsed to: <a href="to/url">some text</a>

What's the best way to parse this text input?
Regex? (I don't have any experience with regex) Some Python library?
Or should I write my own parser, "reading" the input and applying logic where needed?

user1102018
  • 4,369
  • 6
  • 26
  • 33

3 Answers3

4

The emphasis element of the language you describe looks like Markdown.

You should consider just using Markdown, as is. There is a Python module that parses it too.

ArjunShankar
  • 23,020
  • 5
  • 61
  • 83
1

The best way depends on exactly what your input "language" is. If it has the same sort of nested structures as HTML, you don't want to do it with regular expressions. (Obligatory link: RegEx match open tags except XHTML self-contained tags)

Are you inventing your own little markup language?

  • If you are: why? Why not use one of the already existing ones, such as Markdown or reST, for which parsers already exist?
  • If you aren't: why are you writing your own parser? Isn't there one already?
Community
  • 1
  • 1
Gareth McCaughan
  • 19,888
  • 1
  • 41
  • 62
  • I need a simple text box with a few extras, such as: bold text, italics and links. And I need it to be simple for the user (this is why I use asterisks instead of HTML tags). Of course I will be happy to use already existing library instead of writing one myself. I just don't know any... – user1102018 May 01 '12 at 12:07
1

You can have a look at some existing libraries for parsing wiki text:

This one seems to work with the same format you've defined.

Headings: ! Heading1 text !! Heading2 text !!! Heading3 text

Bold: Bolded Text

Italic: Italicized Text

Underline: +Underlined Text+

Or this one that has a really simple API and allows for checking if the given text is actually a wiki text.

UPDATED - Added python wiki parsers:

Having a look at a list of wiki parsers from here.

Media wiki-parser seems to be a good python parser that generates html from wiki markup:

https://github.com/peter17/mediawiki-parser

txominpelu
  • 1,067
  • 1
  • 6
  • 11