3

I am saving both the markdown text and html versions of text in my database for my Q&A site.

When users browse the questions they are viewing snippets of markdown. They should only be viewing plain text just like when browsing questions on stackoverflow there are no markdown text or html text shown.

Is there a way to convert either markdown or html text to plain text?

p.campbell
  • 98,673
  • 67
  • 256
  • 322
Luke101
  • 63,072
  • 85
  • 231
  • 359
  • 1
    It appears from my research that you need ONLY store the Markdown in the database and then you sanitize it all on output. This will also reduce your storage requirements. – Chase Florell Jul 07 '10 at 03:47
  • Just out of curiosity - if you didn't plan to show the HTML to the end user, why are you generating it from markdown, and why are you storing it? – Franci Penov Jul 07 '10 at 03:48
  • @Franci, see my comments in my answer. I "think" he is referring to having no HTML in the preview (notice how he mentions "browse for this question" – Chase Florell Jul 07 '10 at 03:51

2 Answers2

4

Questions on StackOverflow are being viewed in HTML not plain text. They are sanitized using Jeff Atwood's HTML sanitizer and then converted to HTML using MarkDownSharp.

I asked this question a few weeks back and the solution I ended up with was to store the raw markdown in the database and then transform it when it's shown to the visitor.

Here's how I'm sanitizing my Markdown

        ''# Because some people can be real ass holes and try to submit bad data (scripts and crap)
        ''# we have to modify the "About" content in order to sanitize it.  At the same time, we
        ''# transform the Markdown into valid HTML
        user.About = Trim(Utilities.HtmlSanitizer.Sanitize(MarkDownSharp.Transform(user.About)))

Since MarkdownSharp is open source, I'm sure you could dig into the source code and remove the additional tags that you don't want to see in the preview.

EDIT:

Since in my example I'm sanitizing the HTML before converting the markdown, I think you would have to remove the <b> or <strong> tags in both the HtmlSanitizer and MarkdownSharp. The reason for this is that you'll need to sanitize raw html tags AND markdown tags.

Community
  • 1
  • 1
Chase Florell
  • 46,378
  • 57
  • 186
  • 376
  • In this question I made the "Hello" word bold...if you can go back and browse for this question "Hello" will not be bold. This is what I need – Luke101 Jul 07 '10 at 03:44
  • you would do this by "sanitizing" the `` and the `` tags out of the markdown. – Chase Florell Jul 07 '10 at 03:45
  • May I ask why you would want to use Markdown if you are only showing plain text? why not just use a TextArea? – Chase Florell Jul 07 '10 at 03:45
  • I think I'm understanding you now. You show the HTML in the full view, but not in the preview... is that correct? If so, again, you would use the "Html Sanitizer" link I posted above but have two methods. `SanitizeHtmlForDisplay` and `SanitizeHtmlForPreview` where the preview one would have LESS whitelist rules. – Chase Florell Jul 07 '10 at 03:50
  • Luke. What do your whitelist rules look like in order to achieve this? I'd like to see what you got for results. – Chase Florell Dec 09 '10 at 17:38
0

Another solution is to use markdown XSLT file.

For example : HTML To Markdown Text

LeMoussel
  • 5,290
  • 12
  • 69
  • 122