0

I have a description stored as html I want to render in my component. However, before it can be rendered I need to replace parts of the description with JSX components. However, unlike other questions I've seen that ask this I need to replace more than one type of thing in the description with JSX components. This requires multiple regex statements. Take the following description as an example:

<div style="white-space: pre-line;">
    This is my video.

    0:00 Intro
    4:12 Point 1
    9:12 Point 2
    14:12 Closing Point

    Check out my website at https://example.com

    #tag #tag2 #tag3
</div>

In this description all links need to be wrapped in an link element, timestamps need to be converted into a link that changes the video time and hashtags need to be converted into a link that takes the user to the search page.

This is how I formatted the description when I was using jQuery:

<script src="https://cdnjs.cloudflare.com/ajax/libs/jquery/3.3.1/jquery.min.js"></script>
<div id="description" style="white-space: pre-line;"></div>
<script>
    var description = `
This is my video.

0:00 Intro
4:12 Point 1
9:12 Point 2 
14:12 Closing Point

Check out my website at https://example.com

#tag #tag2 #tag3
    `;
    $('#description').html(createLinks(createHashtagLinks(formatTimestamps(description))));
    
    function createLinks(text) {
        return text.replace(/(https?:\/\/[^\s]+)/g, '<a href="$1" target="_blank" rel="noopener noreferrer">$1</a>')
    }
    function createHashtagLinks(text) {
        return text.replace(/#\w*[a-zA-Z]\w*/g, '<a href="/videos?search=$&">$&</a>');
    }
    function formatTimestamps(text) {
        return text.replace(/^[0-5]?\d(?::(?:[0-5]?\d)){1,2}/gm, function (match) {
            var timeArray = match.split(':').reverse();
            var seconds = 0;
            var i = 1;
            for (let unit of timeArray) {
                seconds += unit * i;
                i *= 60;
            }
            return `<a data-seconds="${seconds}" href="#">${match}</a>`
        });
    }
</script>
Since I am using React Router, instead of replacing matches in the description html with <a> elements I need to instead replace them with <Link> components. How can I do this in React?
No stupid questions
  • 425
  • 1
  • 11
  • 31

1 Answers1

1

Consider the following approach:

  1. Escape any stray &<>'" characters to their respective XML entities.
  2. Wrap the relevant spans with XML tags.
  3. Parse to XML DOM.
  4. Replace XML nodes with React components and render them.

For example:

const customParse = (rawStr) => {
  const str = rawStr.replace(/[&<>'"]/g, (m) => `&#${m.codePointAt(0)};`);

  const wrapped = str
    .replace(/https?:\/\/\S+/g, (m) => `<Link>${m}</Link>`)
    .replace(/#\w*[a-zA-Z]\w*/g, (m) => `<Tag>${m}</Tag>`);

  const dom = new DOMParser().parseFromString(
    `<root>${wrapped}</root>`,
    "application/xml"
  );

  return [...dom.documentElement.childNodes];
};

const RenderedOutput = ({ text }) => (
  <pre>
    {customParse(text).map((node, idx) => {
      if (node.nodeType === Node.TEXT_NODE) {
        return <React.Fragment key={idx}>{node.data}</React.Fragment>;
      } else {
        switch (node.nodeName) {
          case "Link":
            return <Link key={idx} url={node.textContent} />;
          case "Tag":
            return <Tag key={idx} tag={node.textContent} />;
          default:
            throw new Error("not implemented");
        }
      }
    })}
  </pre>
);

CodeSandbox demo

You could implement additional custom logic as needed if you also want to create and read from attribute lists and so on.

Lionel Rowe
  • 5,164
  • 1
  • 14
  • 27
  • Is this approach a safe way to handle this? Theoretically if somebody had valid XML in their description would it not also get parsed? – No stupid questions Sep 11 '20 at 02:18
  • @Nostupidquestions Preventing that is that's what step 1 is for. In the demo, you can see that `` is rendered as plain text. – Lionel Rowe Sep 11 '20 at 07:08
  • Sorry to ask something after I accepted the answer, but I have realized that it is possible for some descriptions to have html elements stored in plaintext inside them. These need be converted into components before parsing the description, otherwise I end up with something like https://imgur.com/a/pvkwoQ3 – No stupid questions Sep 28 '20 at 07:40
  • It's doable with the `DOMParser`-based approach, but exactly how best to do it depends on a couple things. Firstly, is strict XHTML parsing OK, or do you want loose HTML parsing instead? (E.g. do you want to allow `
    ` or only `
    `? And you want malformed syntax to use a best guess approach, or to completely fail to parse?) And secondly, do you want to allow content editors powerful yet potentially dangerous tools, such as running their own scripts, or do you want to limit their capabilities? If content editing is generally available, you'll want the latter.
    – Lionel Rowe Sep 28 '20 at 10:49
  • The descriptions were downloaded by youtube-dl so the contents of each varies a lot depending on the site downloaded from (some sites descriptions include html elements, some are just plaintext). Because what is returned is not very standardized a loose approach would probably be best. There is no need to edit the description. – No stupid questions Sep 28 '20 at 11:01
  • Here's a modified version of the original code sandbox. The custom rendering logic will probably need some tweaking depending on your exact requirements. https://codesandbox.io/s/custom-react-parsing-with-sanitized-html-xjwke – Lionel Rowe Sep 28 '20 at 12:12
  • I am encountering one issue with the codesandbox. If a void element tag such as `
    ` is added to `allow.js` and found in the description React will throw the error `Error: br is a void element tag and must neither have children nor use dangerouslySetInnerHTML. allow.js line: 53`. Nevertheless, thank you for this incredible answer.
    – No stupid questions Sep 28 '20 at 13:28
  • final return statement in mapDomToReact.js (renamed sanitize.js): `return node.textContent ? ( {[...node.childNodes].map((n, i) => mapDomToReact(n, i))} ) : ( )` – Lionel Rowe Sep 28 '20 at 13:52