1

I'm trying to remove all Wiki markup from some text. I want to process the text but all the markup is messing it up. Do you know how to remove all the markup?

If regex isn't viable, I am using C#.

Example text: http://regexr.com/39fnb

Edit: I've come up with the following regex: ([[)|(]])|(Category:.)|({{.}})|(=+.+=)|([.*?])

It works at parsing some stuff, but not everything. For example, it can't parse the lines starting with | that have code. I tried adding something that could do that, but it didn't work.

user1599078
  • 167
  • 1
  • 3
  • 9
  • Please make sure to scope down your question so it is not so inviting to close as duplicate of [RegEx match open tags except XHTML self-contained tags](http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags) ... Consider using search engine to help to narrow down the request - like http://www.bing.com/search?q=c%23+parser+Wiki+markup – Alexei Levenkov Sep 10 '14 at 22:26
  • Try to accomplish it yourself first, and when you come across a more specific (**concrete**) problem, post it here, showing what you've tried and what you're struggling with. – Gutblender Sep 10 '14 at 22:35
  • few lines of input output can be added – vks Sep 11 '14 at 01:15
  • What did you try for parsing lines beginning with `|`? – Gutblender Sep 11 '14 at 02:15
  • @Gutblender I think it was Something like (^[ ]?|) I'm not sure why it didn't work. I added a | between that and the rest. It works in the online checker, but it doesn't work in the program. – user1599078 Sep 11 '14 at 08:49
  • In regex, to match characters not in a set, you use `[^chars]`, not `^[chars]`. – Gutblender Sep 11 '14 at 13:33
  • @Gutblender It removed my space, but ther was a space between the []. It worked in the web emulator, and also in a .net regex emulator. What would you suggest to fix it? – user1599078 Sep 11 '14 at 17:27

0 Answers0