0

Rather than using dangerouslySetInnerHTML, I'm trying to manually parse a small subset of Markdown and transform them into React components. I need to this because I have some custom components that I will also need to render in the message field, so I will need to be turning these into React components anyway. Also it avoids the possibility of an XSS attack.

My original idea was just to split the message on space and conditionally turn each token into a React component, similar to this:

matchMarkdown(part) {
  let match = part.match(/(^|[^\\])(\*)(.*)(\*)/g); // match on *asdf* but not \*asdf*

  if (match !== null) {
     return <strong> {match[3]}</strong>;
  }

  match = part.match(/(^|[^\\])(_)(.*)(_)/); // match on _qwer_ but not \_qwer_

  if (match !== null) {
     return <em> {match[3]}</em>;
  }

  return " " + part;
}

convertMarkdownToComponents() {
  let parts = this.state.body.split(" ");

  return (
    <div>
      {parts.map(this.matchMarkdown)}
    </div>
  );
}

And this almost works, except for the problem that it is splitting on spaces only. For example, it will work on this message:

the _quick_ *brown* fox

but not on this message:

the _quick_*brown* fox

because there is no space separating the tokens. I'd like that message to turn into something like this:

the quick brown fox

I'm looking to have it work even without spaces, and not sure how. Also, the current solution seems pretty brittle regarding spaces preceding everything. Any advice?

Ryan Peschel
  • 11,087
  • 19
  • 74
  • 136
  • Have you looked at https://github.com/rexxars/react-markdown? I'd suggest not rolling your own in this case. But if you really feel like you want to, use it as an example to dig through for ideas. – shadymoses Sep 16 '19 at 06:24
  • Thanks for the link. Unfortunately I need to roll my own because my cases are not the same as theirs. For example, one of the token types I need to parse with a Regex is turned in to an entirely custom React component. Luckily I only need to support 3 markdown cases (bold, italics, underline), so it shouldn't be terribly hard. – Ryan Peschel Sep 16 '19 at 06:28
  • Try this one then: https://github.com/probablyup/markdown-to-jsx. From the demo site it claims: _You can even include custom React components if you declare them in the "overrides" option._ – shadymoses Sep 16 '19 at 06:36
  • Honestly, I'm not going to use an external library for this. My use-case is too small to warrant it. I'll check the link for inspiration though, thanks. – Ryan Peschel Sep 16 '19 at 06:42

1 Answers1

0

Parsing Markdown with a regex is never going to be fun, or complete, because you can't parse arbitrary Markdown with a regex. For the same reason that you can't parse arbitrary HTML with a regex.

See this canonical answer for illumination.

You can write a regex that will parse some sufficiently simple Markdown/HTML, and you will need to account for things like whitespace that may or may not exist, and nested elements, and other complexities that you allow in your inputs. There's no getting around it.

Use a Markdown parser if you need to parse it properly. A quick google reveals many, for example:

https://github.com/evilstreak/markdown-js
https://github.com/markedjs/marked
https://github.com/showdownjs/showdown

Matt
  • 3,677
  • 1
  • 14
  • 24