Converting markdown to html with javascript in rich text editor

Question

I am developing a rich text editor for my website. If the user wrote something that has HTML syntax, I would like it to convert it to HTML, just like the text editor in Stack Overflow.

I would like it to:

split the text on each tag, and the array elements should include the tag that was written
transform the < and > to their corresponding signs, unless the tags are inside PRE and CODE tags

For now, I tried using a Regexp I found here for splitting the HTML, but if I test the code below, I would get:

['Lorem ipsum dolor', 'sit amet', 'consectetur', 'adipiscing', 'elit.' 'Sed erat odio, fringilla in lorem eu.'] , which is defintely not what I want, I would want something like:

['Lorem ipsum dolor', '<h1>', 'sit amet', '</h1>', '<h6>', 'consectetur', '<b>', 'adipiscing', '</b>, '</h6>', 'elit.', '<br>', 'Sed erat odio, fringilla in lorem eu.']

Then I would just:

function splitHTML(str) {
    return str.split(/<(?:"[^"]*"['"]*|'[^']*'['"]*|[^'">])+>/g)
}

function isHTML(str) {
    return /<(?:"[^"]*"['"]*|'[^']*'['"]*|[^'">])+>/g.match(str)
}

const arr = splitHTML("Lorem ipsum dolor <h1>sit amet</h1>, <h6>consectetur <b>adipiscing</b> </h6>elit. <br>Sed erat odio, fringilla in lorem eu.") 

for (let element of arr) {
    if (isHTML(element)) {
        element = cod.replaceAll('&lt;', '<');
        element = cod.replaceAll('&gt;', '>');
    }
}

arr.join()

My question is:

How to split a text including the separator in the result.

And I also would like to know how to check if the code is between pre and code tags.

So what do you want? Please include your expected output in the question. — Wais Kamal, Dec 18 '20 at 19:38
He wants to have both the text and the HTML in the resulting array. Not just the text as he showed. — TimonNetherlands, Dec 18 '20 at 19:41
Tricky because you can nest html elements, that doesn't translate well to your flat array model. — James, Dec 18 '20 at 19:52
You should check [this](https://stackoverflow.com/a/1732454/485337) post out... — Adam Arold, Dec 18 '20 at 19:54
@James um, you are right, but maybe It should return something like for example ['Lorem ipsum dolor', '
', 'sit amet', '
', '
', 'consecte', '', 'tur', '', 'adipiscing', '
', 'elit.', '
', 'Sed erat odio, fringilla in lorem eu.'] so then when it's done I can check using the same regexp but with match if it is html and the replace all. — PoliPau, Dec 18 '20 at 20:41
What if HTML attributes are used? What if those attribute values look like tags? What if HTML comments are used? What if CDATA is used? What if the content uses html entities? What if I tell you that parsing HTML with regex is never going to work right for all possible input? What Adam Arold said. — trincot, Dec 18 '20 at 21:28
@trincot damn it you are right, most of the tags will actually have attributes. I don't have the faintest idea of what CDATA is but I get it, it would never work. — PoliPau, Dec 18 '20 at 21:44
Use a DOM parser, like [`DOMParser`](https://developer.mozilla.org/en-US/docs/Web/API/DOMParser) — trincot, Dec 18 '20 at 21:46
Does it have to be an array? Or can it also be another object that allows you to nest the HTML elements? — Danny, Dec 18 '20 at 21:55
@SamuelEbert It does not have to be an array, if it is iterable, it is fine by me. — PoliPau, Dec 19 '20 at 13:58

Danny · Answer 1 · 2021-02-12T21:59:38.710

You do not have to iterate through an object to display the HTML. You can do something as simple as:

// Create a new iframe HTML element
const preview = document.createElement("iframe");

// Set a unique id so it is easier to reference in code later on (you can also use the id in CSS)
preview.id = "preview";

// Set the iframe's content according to your HTML string
preview.srcdoc = yourHtmlString;

// Add the iframe to the page's body (or whatever element you want)
document.body.append(preview);

If you for whatever reason have to iterate through the HTML elements, you can add the following additional code:

function forEachChild(element) {
  for (let i = 0; i < element.children.length; i++) {
    forEachChild(element.children[i]);

    // Whatever you want to do for each element, write it here

    // Please note that replacing "&lt;" and "&gt;" is not necesarry using the above code
    // snippet. However, if there is some other tag-specific code, here is how to add it:
    switch (element.children[i].tagName.toLowerCase()) {
      case "pre":
      case "code":
        // If there is something specific you want to do with a pre/code tag, add it here
        break;
  }
}

forEachChild(preview.contentWindow.document.body);

score 0 · Answer 2 · answered Dec 28 '20 at 02:54

Best to use an HTML parser, such as https://www.npmjs.com/package/node-html-parser. It is possible to use regex, but it is not that robust.

I do not understand why you want to unescape the < and > just outside <code> and <pre> tags, but you can use this code if you want to go the regex route:

const input = "Lorem ipsum dolor <h1>sit amet</h1>, <h6>consectetur <b>adipiscing</b> </h6>elit. <br>Sed erat odio, &lt;fringilla&gt; in lorem eu. <pre>pre text with &lt;tag&gt</pre>. Back to &lt;normal&gt; text";
const tagRegex = /(<(?:"[^"]*"['"]*|'[^']*'['"]*|[^'">])+>)/;
let inPreOrCode = false;
let result = input.split(tagRegex).map(str => {
  if(tagRegex.test(str)) {
    // is tag
    if(str.match(/^<(code|pre)\b/i)) {
      inPreOrCode = true;
    } else if(str.match(/^<\/(code|pre)\b/i)) {
      inPreOrCode = false;
    }
  } else if(!inPreOrCode) {
    str = str.replace(/&lt;/g, '<').replace(/&gt;/g, '>')
  }
  return str;
}).join('');
console.log('Input:  ' + input);
console.log('Result: ' + result);

Output:

Input:  Lorem ipsum dolor <h1>sit amet</h1>, <h6>consectetur <b>adipiscing</b> </h6>elit. <br>Sed erat odio, &lt;fringilla&gt; in lorem eu. <pre>pre text with &lt;tag&gt</pre>. Back to &lt;normal&gt; text
Result: Lorem ipsum dolor <h1>sit amet</h1>, <h6>consectetur <b>adipiscing</b> </h6>elit. <br>Sed erat odio, <fringilla> in lorem eu. <pre>pre text with &lt;tag&gt</pre>. Back to <normal> text

Explanation:

enclose the whole tagRegex into parenthesis, this will include the tags in the resulting array of the split
map through the array and set/clear the inPreOrCode flag on entry/exit of those tags
if flag is not set, unescape the < and >

score -1 · Answer 3 · answered Dec 18 '20 at 22:09

-1

This post can help you with capturing delimiters: https://stackoverflow.com/a/1732454/485337

For checking tag enclosure, you are in the territory of https://stackoverflow.com/a/1732454/485337, as noted in comments.

answered Dec 18 '20 at 22:09

chaos

122,029
33
303
309

1

This answer seems to suggest the question should be closed as duplicate with those two references. So why then answer? – trincot Dec 18 '20 at 22:12

Converting markdown to html with javascript in rich text editor

', 'sit amet', '

', 'consecte', '', 'tur', '', 'adipiscing', '

3 Answers3