Parsing Stackoverflow-like text box in Python

Question

I have a <textarea> where the user enters his text. The text can contain special chars which I need to parse and replace with HTML tags for display purposes.
For example:
Bolded text will be entered as: *some text* and parsed to: <strong>some text</strong>.
URL will be entered as: #some text | to/url# and parsed to: <a href="to/url">some text</a>

What's the best way to parse this text input?
Regex? (I don't have any experience with regex) Some Python library?
Or should I write my own parser, "reading" the input and applying logic where needed?

Have a look at Markdown for Python before you try to write anything yourself. http://freewisdom.org/projects/python-markdown/ — alan, May 01 '12 at 12:00

score 4 · Accepted Answer · answered May 01 '12 at 12:02

4

The emphasis element of the language you describe looks like Markdown.

You should consider just using Markdown, as is. There is a Python module that parses it too.

answered May 01 '12 at 12:02

ArjunShankar

23,020
5
61
83

score 1 · Answer 2 · edited May 23 '17 at 10:34

1

The best way depends on exactly what your input "language" is. If it has the same sort of nested structures as HTML, you don't want to do it with regular expressions. (Obligatory link: RegEx match open tags except XHTML self-contained tags)

Are you inventing your own little markup language?

If you are: why? Why not use one of the already existing ones, such as Markdown or reST, for which parsers already exist?
If you aren't: why are you writing your own parser? Isn't there one already?

edited May 23 '17 at 10:34

Community

1
1

answered May 01 '12 at 12:01

Gareth McCaughan

19,888
1
41
62

I need a simple text box with a few extras, such as: bold text, italics and links. And I need it to be simple for the user (this is why I use asterisks instead of HTML tags). Of course I will be happy to use already existing library instead of writing one myself. I just don't know any... – user1102018 May 01 '12 at 12:07

txominpelu · Answer 3 · 2012-05-01T12:54:00.587

1

You can have a look at some existing libraries for parsing wiki text:

http://remysharp.com/2008/04/01/wiki-to-html-using-javascript/

This one seems to work with the same format you've defined.

Headings: ! Heading1 text !! Heading2 text !!! Heading3 text

Bold: Bolded Text

Italic: Italicized Text

Underline: +Underlined Text+

http://randomactsofcoding.blogspot.co.uk/2009/08/parsewikijs-javascript-wiki-parsing.html

Or this one that has a really simple API and allows for checking if the given text is actually a wiki text.

UPDATED - Added python wiki parsers:

Having a look at a list of wiki parsers from here.

Media wiki-parser seems to be a good python parser that generates html from wiki markup:

https://github.com/peter17/mediawiki-parser

edited May 01 '12 at 12:54

answered May 01 '12 at 12:03

txominpelu

1,067
1
6
11

Thanks, this looks like what I was thinking about, but I was looking for a server side parser. But I might get some ideas from your links. – user1102018 May 01 '12 at 12:13
I've added a good python parser from github, that may work for you. – txominpelu May 01 '12 at 12:54

Parsing Stackoverflow-like text box in Python

3 Answers3