How to parse a small subset of Markdown into React components?

Question

I have a very small subset of Markdown along with some custom html that I would like to parse into React components. For example, I would like to turn this following string:

hello *asdf* *how* _are_ you !doing! today

Into the following array:

[ "hello ", asdf, " ", how, " ", are, " you ", <MyComponent onClick={this.action}>doing</MyComponent>, " today" ]

and then return it from a React render function (React will render the array properly as formatted HTML)

Basically, I want to give users the option to use a very limited set of Markdown to turn their text into styled components (and in some cases my own components!)

It is unwise to dangerouslySetInnerHTML, and I do not want to bring in an external dependency, because they are all very heavy, and I only need very basic functionality.

I'm currently doing something like this, but it is very brittle, and doesn't work for all cases. I was wondering if there were a better way:

function matchStrong(result, i) {
  let match = result[i].match(/(^|[^\\])\*(.*)\*/);
  if (match) { result[i] = <strong key={"ms" + i}>{match[2]}</strong>; }
  return match;
}

function matchItalics(result, i) {
  let match = result[i].match(/(^|[^\\])_(.*)_/); // Ignores \_asdf_ but not _asdf_
  if (match) { result[i] = <em key={"mi" + i}>{match[2]}</em>; }
  return match;
}

function matchCode(result, i) {
  let match = result[i].match(/(^|[^\\])```\n?([\s\S]+)\n?```/);
  if (match) { result[i] = <code key={"mc" + i}>{match[2]}</code>; }
  return match;
}

// Very brittle and inefficient
export function convertMarkdownToComponents(message) {
  let result = message.match(/(\\?([!*_`+-]{1,3})([\s\S]+?)\2)|\s|([^\\!*_`+-]+)/g);

  if (result == null) { return message; }

  for (let i = 0; i < result.length; i++) {
    if (matchCode(result, i)) { continue; }
    if (matchStrong(result, i)) { continue; }
    if (matchItalics(result, i)) { continue; }
  }

  return result;
}

Here is my previous question which led to this one.

What if the input has nested items, like `font _italic *and bold* then only italic_ and normal`? What would be the expected result? Or will it never be nested? — trincot, Dec 05 '19 at 21:13
No need to worry about nesting. It's just very basic markdown for users to use. Whatever is easiest to implement is fine with me. In your example, it'd be completely fine if the inner bolding didn't work. But if it's easier to implement nesting than to not have it then that's alright too. — Ryan Peschel, Dec 05 '19 at 21:15
It's probably easiest to just use an off-the-shelf solution like https://www.npmjs.com/package/react-markdown-it — mb21, Dec 08 '19 at 18:50
I'm not using markdown though. It's just a very similar / small subset of it (which supports a couple custom components, along with non-nested bold, italics, code, underline). The snippets I posted somewhat work, but don't seem very ideal, and fail in some trivial cases, (like you can't type a single astericks like this: `asdf*` without it disappearing) — Ryan Peschel, Dec 09 '19 at 16:43
well... parsing markdown or something like markdown is not exactly an easy task... regexes don't cut it... for a similar question regarding html, see https://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags — mb21, Dec 09 '19 at 16:59

score 5 · Answer 1 · answered Dec 09 '19 at 16:57

It looks like you are looking for a small very basic solution. Not "super-monsters" like react-markdown-it :)

I would like to recommend you https://github.com/developit/snarkdown which looks pretty lightweight and nice! Just 1kb and extremely simple, you can use it & extend it if you need any other syntax features.

Supported tags list https://github.com/developit/snarkdown/blob/master/src/index.js#L1

Update

Just noticed about react components, missed it in the beginning. So that's great for you I believe to take the library as an example and implement your custom required components to get it done without setting HTML dangerously. The library is pretty small and clear. Have fun with it! :)

Simon · Answer 2 · 2019-12-11T03:20:56.670

var table = {
  "*":{
    "begin":"<strong>",
    "end":"</strong>"
    },
  "_":{
    "begin":"<em>",
    "end":"</em>"
    },
  "!":{
    "begin":"<MyComponent onClick={this.action}>",
    "end":"</MyComponent>"
    },

  };

var myMarkdown = "hello *asdf* *how* _are_ you !doing! today";
var tagFinder = /(?<item>(?<tag_begin>[*|!|_])(?<content>\w+)(?<tag_end>\k<tag_begin>))/gm;

//Use case 1: direct string replacement
var replaced = myMarkdown.replace(tagFinder, replacer);
function replacer(match, whole, tag_begin, content, tag_end, offset, string) {
  return table[tag_begin]["begin"] + content + table[tag_begin]["end"];
}
alert(replaced);

//Use case 2: React components
var pieces = [];
var lastMatchedPosition = 0;
myMarkdown.replace(tagFinder, breaker);
function breaker(match, whole, tag_begin, content, tag_end, offset, string) {
  var piece;
  if (lastMatchedPosition < offset)
  {
    piece = string.substring(lastMatchedPosition, offset);
    pieces.push("\"" + piece + "\"");
  }
  piece = table[tag_begin]["begin"] + content + table[tag_begin]["end"];
  pieces.push(piece);
  lastMatchedPosition = offset + match.length;

}
alert(pieces);

The result:

Regexp test result

Explanation:

/(?<item>(?<tag_begin>[*|!|_])(?<content>\w+)(?<tag_end>\k<tag_begin>))/

You can define your tags in this section: [*|!|_], once one of them is matched, it will be captured as a group and named as "tag_begin".
And then (?<content>\w+) captures the content wrapped by the tag.
The ending tag must be as same as the previously matched one, so here uses \k<tag_begin>, and if it passed the test then capture it as a group and give it a name "tag_end", that's what (?<tag_end>\k<tag_begin>)) is saying.

In the JS you've set up a table like this:

var table = {
  "*":{
    "begin":"<strong>",
    "end":"</strong>"
    },
  "_":{
    "begin":"<em>",
    "end":"</em>"
    },
  "!":{
    "begin":"<MyComponent onClick={this.action}>",
    "end":"</MyComponent>"
    },

  };

Use this table to replace the matched tags.

Sting.replace has an overload String.replace(regexp, function) which can take captured groups as it's parameters, we use these captured items for looking up the table and generate the replacing string.

[Update]
I have updated the code, I kept the first one in case someone else doesn't need react components, and you can see there is little difference between them.

Unfortunately I'm not sure if this works. Because I need the actual React components and elements themselves, not strings of them. If you look in my original post you'll see that I'm adding the actual elements themselves to an array, not strings of them. And using dangerouslySetInnerHTML is dangerous as the user could input malicious strings. — Ryan Peschel, Dec 10 '19 at 17:37
Fortunately it's very simple to convert the string replacement to React components, I have updated the code. — Simon, Dec 11 '19 at 03:22
Hm? I must be missing something, because they're still strings on my end. I even made a fiddle with your code. If you read the `console.log` output you'll see the array is full of strings, not actual React components: https://jsfiddle.net/xftswh41/ — Ryan Peschel, Dec 11 '19 at 23:17
Honestly I don't know React, so I can't make everything perfectly followed by your needs, but I think the information about how to resolve your question is enough, you need to put them to your React machine and it just can go. — Simon, Dec 12 '19 at 02:17
The reason why this thread exists is because it seems to be significantly harder to parse them into React components (hence the thread title specifying that exact need). Parsing them into strings is fairly trivial and you can just use the string replace function. The strings are not an ideal solution because they're slow and susceptible to XSS due to having to call dangerouslySetInnerHTML — Ryan Peschel, Dec 12 '19 at 13:22

LuDanin · Accepted Answer · 2019-12-16T22:45:53.287

How it works?

It works by reading a string chunk by chunk, which might not be the best solution for really long strings.

Whenever the parser detects a critical chunk is being read, i.e. '*' or any other markdown tag, it starts parsing chunks of this element until the parser finds its closing tag.

It works on multi-line strings, see the code for example.

Caveats

You haven't specified, or I could have misuderstood your needs, if there's the necessity to parse tags that are both bold and italic, my current solution might not work in this case.

If you need, however, to work with the above conditions just comment here and I'll tweak the code.

First update: tweaks how markdown tags are treated

Tags are no longer hardcoded, instead they are a map where you can easily extend to fit your needs.

Fixed the bugs you've mentioned in the comments, thanks for pointing this issues =p

Second update: multi-length markdown tags

Easiest way of achieving this: replacing multi-length chars with a rarely used unicode

Though the method parseMarkdown does not yet support multi-length tags, we can easily replace those multi-length tags with a simple string.replace when sending our rawMarkdown prop.

To see an example of this in practice, look at the ReactDOM.render, located at the end of the code.

Even if your application does support multiple languages, there are invalid unicode characters that JavaScript still detects, ex.: "\uFFFF" is not a valid unicode, if I recall correctly, but JS will still be able to compare it ("\uFFFF" === "\uFFFF" = true)

It might seems hack-y at first but, depending on your use-case, I don't see any major issues by using this route.

Another way of achieving this

Well, we could easily track the last N (where N corresponds to the length of the longest multi-length tag) chunks.

There would be some tweaks to be made to the way the loop inside method parseMarkdown behaves, i.e. checking if current chunk is part of a multi-length tag, if it is use it as a tag; otherwise, in cases like ``k, we'd need to mark it as notMultiLength or something similar and push that chunk as content.

Code

// Instead of creating hardcoded variables, we can make the code more extendable
// by storing all the possible tags we'll work with in a Map. Thus, creating
// more tags will not require additional logic in our code.
const tags = new Map(Object.entries({
  "*": "strong", // bold
  "!": "button", // action
  "_": "em", // emphasis
  "\uFFFF": "pre", // Just use a very unlikely to happen unicode character,
                   // We'll replace our multi-length symbols with that one.
}));
// Might be useful if we need to discover the symbol of a tag
const tagSymbols = new Map();
tags.forEach((v, k) => { tagSymbols.set(v, k ); })

const rawMarkdown = `
  This must be *bold*,

  This also must be *bo_ld*,

  this _entire block must be
  emphasized even if it's comprised of multiple lines_,

  This is an !action! it should be a button,

  \`\`\`
beep, boop, this is code
  \`\`\`

  This is an asterisk\\*
`;

class App extends React.Component {
  parseMarkdown(source) {
    let currentTag = "";
    let currentContent = "";

    const parsedMarkdown = [];

    // We create this variable to track possible escape characters, eg. "\"
    let before = "";

    const pushContent = (
      content,
      tagValue,
      props,
    ) => {
      let children = undefined;

      // There's the need to parse for empty lines
      if (content.indexOf("\n\n") >= 0) {
        let before = "";
        const contentJSX = [];

        let chunk = "";
        for (let i = 0; i < content.length; i++) {
          if (i !== 0) before = content[i - 1];

          chunk += content[i];

          if (before === "\n" && content[i] === "\n") {
            contentJSX.push(chunk);
            contentJSX.push(<br />);
            chunk = "";
          }

          if (chunk !== "" && i === content.length - 1) {
            contentJSX.push(chunk);
          }
        }

        children = contentJSX;
      } else {
        children = [content];
      }
      parsedMarkdown.push(React.createElement(tagValue, props, children))
    };

    for (let i = 0; i < source.length; i++) {
      const chunk = source[i];
      if (i !== 0) {
        before = source[i - 1];
      }

      // Does our current chunk needs to be treated as a escaped char?
      const escaped = before === "\\";

      // Detect if we need to start/finish parsing our tags

      // We are not parsing anything, however, that could change at current
      // chunk
      if (currentTag === "" && escaped === false) {
        // If our tags array has the chunk, this means a markdown tag has
        // just been found. We'll change our current state to reflect this.
        if (tags.has(chunk)) {
          currentTag = tags.get(chunk);

          // We have simple content to push
          if (currentContent !== "") {
            pushContent(currentContent, "span");
          }

          currentContent = "";
        }
      } else if (currentTag !== "" && escaped === false) {
        // We'll look if we can finish parsing our tag
        if (tags.has(chunk)) {
          const symbolValue = tags.get(chunk);

          // Just because the current chunk is a symbol it doesn't mean we
          // can already finish our currentTag.
          //
          // We'll need to see if the symbol's value corresponds to the
          // value of our currentTag. In case it does, we'll finish parsing it.
          if (symbolValue === currentTag) {
            pushContent(
              currentContent,
              currentTag,
              undefined, // you could pass props here
            );

            currentTag = "";
            currentContent = "";
          }
        }
      }

      // Increment our currentContent
      //
      // Ideally, we don't want our rendered markdown to contain any '\'
      // or undesired '*' or '_' or '!'.
      //
      // Users can still escape '*', '_', '!' by prefixing them with '\'
      if (tags.has(chunk) === false || escaped) {
        if (chunk !== "\\" || escaped) {
          currentContent += chunk;
        }
      }

      // In case an erroneous, i.e. unfinished tag, is present and the we've
      // reached the end of our source (rawMarkdown), we want to make sure
      // all our currentContent is pushed as a simple string
      if (currentContent !== "" && i === source.length - 1) {
        pushContent(
          currentContent,
          "span",
          undefined,
        );
      }
    }

    return parsedMarkdown;
  }

  render() {
    return (
      <div className="App">
        <div>{this.parseMarkdown(this.props.rawMarkdown)}</div>
      </div>
    );
  }
}

ReactDOM.render(<App rawMarkdown={rawMarkdown.replace(/```/g, "\uFFFF")} />, document.getElementById('app'));

Link to the code (TypeScript) https://codepen.io/ludanin/pen/GRgNWPv

Link to the code (vanilla/babel) https://codepen.io/ludanin/pen/eYmBvXw

I feel like this solution is on the right track, but it seems to have issues with putting other markdown characters inside of other ones. For example, try replacing `This must be *bold*` with `This must be *bo_ld*`. It causes the resulting HTML to be malformed — Ryan Peschel, Dec 16 '19 at 19:56
Lack of proper testing produced this =p, my bad. I'm already fixing it and going to post the result here, seems like a simple problem to fix. — LuDanin, Dec 16 '19 at 19:58
Yeah, thanks. I really do like this solution though. It seems very robust and clean. I think it can be refactored a bit though for even more elegance. I might try messing around with it a bit. — Ryan Peschel, Dec 16 '19 at 19:58
Done, by the way, I've tweaked the code to support a much more flexible way of defining markdown tags and their respective JSX values. — LuDanin, Dec 16 '19 at 21:53
Hey thanks this looks great. Just one last thing and I think it'll be perfect. In my original post I have a function for code snippets too (that involve triple backticks). Would it be possible to have support for that as well? So that the tags could optionally be multiple characters? Another reply added support by replacing instances of ``` with a rarely used character. That would be an easy way to do it, but not sure if that's ideal. — Ryan Peschel, Dec 16 '19 at 22:09
Done, keep in mind that I've only updated the code to support multi-length tags by replacing them with a rarely used unicode character. If you could provide me examples for why it wouldn't be ideal I'd love to work in any workaround to the issue. — LuDanin, Dec 16 '19 at 22:50

Jatin Parmar · Answer 4 · 2019-12-16T12:36:00.867

you can do it like this:

//inside your compoenet

   mapData(myMarkdown){
    return myMarkdown.split(' ').map((w)=>{

        if(w.startsWith('*') && w.endsWith('*') && w.length>=3){
           w=w.substr(1,w.length-2);
           w=<strong>{w}</strong>;
         }else{
             if(w.startsWith('_') && w.endsWith('_') && w.length>=3){
                w=w.substr(1,w.length-2);
                w=<em>{w}</em>;
              }else{
                if(w.startsWith('!') && w.endsWith('!') && w.length>=3){
                w=w.substr(1,w.length-2);
                w=<YourComponent onClick={this.action}>{w}</YourComponent>;
                }
            }
         }
       return w;
    })

}


 render(){
   let content=this.mapData('hello *asdf* *how* _are_ you !doing! today');
    return {content};
  }

score 0 · Answer 5 · edited Jun 20 '20 at 09:12

A working solution purely using Javascript and ReactJs without dangerouslySetInnerHTML.

Approach

Character by character search for the markdown elements. As soon as one is encountered, search for the ending tag for the same and then convert it into html.

Tags supported in the snippet

bold
italics
em
pre

Input and Output from snippet:

JsFiddle: https://jsfiddle.net/sunil12738/wg7emcz1/58/

Code:

const preTag = "đ"
const map = {
      "*": "b",
      "!": "i",
      "_": "em",
      [preTag]: "pre"
    }

class App extends React.Component {
    constructor(){
      super()
      this.getData = this.getData.bind(this)
    }

    state = {
      data: []
    }
    getData() {
      let str = document.getElementById("ta1").value
      //If any tag contains more than one char, replace it with some char which is less frequently used and use it
      str = str.replace(/```/gi, preTag)
      const tempArr = []
      const tagsArr = Object.keys(map)
      let strIndexOf = 0;
      for (let i = 0; i < str.length; ++i) {
        strIndexOf = tagsArr.indexOf(str[i])
        if (strIndexOf >= 0 && str[i-1] !== "\\") {
          tempArr.push(str.substring(0, i).split("\\").join("").split(preTag).join(""))
          str = str.substr(i + 1);
          i = 0;
          for (let j = 0; j < str.length; ++j) {
            strIndexOf = tagsArr.indexOf(str[j])
            if (strIndexOf >= 0 && str[j-1] !== "\\") {
              const Tag = map[str[j]];
              tempArr.push(<Tag>{str.substring(0, j).split("\\").join("")}</Tag>)
              str = str.substr(j + 1);
              i = 0;
              break
             }
          }
        }
      }
      tempArr.push(str.split("\\").join(""))
      this.setState({
        data: tempArr,
      })
    }
    render() {
      return (
        <div>
          <textarea rows = "10"
            cols = "40"
           id = "ta1"
          /><br/>
          <button onClick={this.getData}>Render it</button><br/> 
          {this.state.data.map(x => x)} 
        </div>
      )
    }
  }

ReactDOM.render(
  <App/>,
  document.getElementById('root')
);

<body>
  <script src="https://cdnjs.cloudflare.com/ajax/libs/react/16.2.0/umd/react.production.min.js"></script>
  <script src="https://cdnjs.cloudflare.com/ajax/libs/react-dom/16.2.0/umd/react-dom.production.min.js"></script>
  <div id="root"></div>
</body>

Detailed explanation (with example):

Suppose if string is How are *you* doing? Keep a mapping for symbols to tags

map = {
 "*": "b"
}

Loop till you find first *, text before that is normal string
Push that inside array. Array become ["How are "] and start inner loop till you find next *.
Now next between * and * needs to be bold, we convert them in html element by text and directly push in array where Tag = b from the map. If you do <Tag>text</Tag>, react internally converts into text and push into array. Now array is ["how are ", you]. Break from inner loop
Now we start outer loop from there and no tags are found, so push remaining in the array. Array becomes: ["how are ", you, " doing"].
Render on UI How are you doing?
Note: you is html and not text

Note: Nesting is also possible. We need to call the above logic in recursion

To Add New tags support

If they are one character like * or !, add them in map object with key as character and value as corresponding tag
If they are more than one character such as ```, create a one to one map with some less frequently used char and then insert (Reason: currently, approach based on character by character search and so more than one char will break. However, that can also be taken care by improving the logic)

Does it supports nesting? No
Does it support all use cases mentioned by OP? Yes

Hope it helps.

Hi, looking over this now. Is this possible to use with triple backtick support as well? So \```asdf\``` would work as well for code blocks? — Ryan Peschel, Dec 16 '19 at 16:39
It will but some modifications might be needed. Currently, only single character matching is there for * or !. That needs to be modified little bit. Code blocks basically means ```asdf``` will be rendered `
asdf
` with dark background, right? Let me know this and I will see. Even you can try now. A simple approach is: In the above solution, replace the ``` in text with a special character such as ^ or ~ and map it to pre tag. Then it will work fine. Other approach needs some more work — Sunil Chaudhary, Dec 16 '19 at 16:50
@RyanPeschel Hi! Have added the `pre` tag support as well. Let me know if it works — Sunil Chaudhary, Dec 16 '19 at 18:36
Interesting solution (using the rare character). One issue I still see though is the lack of support for escaping (such that \\*asdf* is not bolded), which I included support for in the code in my original post (also mentioned it in my linked elaboration at the end of the post). Would that be very hard to add? — Ryan Peschel, Dec 16 '19 at 19:46
Yes, it is possible and not that hard. I have added it. Though, now I realize that my solution is becoming complicated (and many work arounds). I will try to refactor/rewrite if I have time. — Sunil Chaudhary, Dec 16 '19 at 21:15