How it works?
It works by reading a string chunk by chunk, which might not be the
best solution for really long strings.
Whenever the parser detects a critical chunk is being read, i.e. '*'
or
any other markdown tag, it starts parsing chunks of this element until the
parser finds its closing tag.
It works on multi-line strings, see the code for example.
Caveats
You haven't specified, or I could have misuderstood your needs, if there's
the necessity to parse tags that are both bold and italic, my current
solution might not work in this case.
If you need, however, to work with the above conditions just comment here
and I'll tweak the code.
First update: tweaks how markdown tags are treated
Tags are no longer hardcoded, instead they are a map where you can easily extend
to fit your needs.
Fixed the bugs you've mentioned in the comments, thanks for pointing this issues =p
Second update: multi-length markdown tags
Easiest way of achieving this: replacing multi-length chars with a rarely used unicode
Though the method parseMarkdown
does not yet support multi-length tags,
we can easily replace those multi-length tags with a simple string.replace
when sending our rawMarkdown
prop.
To see an example of this in practice, look at the ReactDOM.render
, located
at the end of the code.
Even if your application does support multiple languages, there are invalid
unicode characters that JavaScript still detects, ex.: "\uFFFF"
is not a valid
unicode, if I recall correctly, but JS will still be able to compare it ("\uFFFF" === "\uFFFF" = true
)
It might seems hack-y at first but, depending on your use-case, I don't see
any major issues by using this route.
Another way of achieving this
Well, we could easily track the last N
(where N
corresponds to the length
of the longest multi-length tag) chunks.
There would be some tweaks to be made to the way the loop inside method
parseMarkdown
behaves, i.e. checking if current chunk is part of a multi-length
tag, if it is use it as a tag; otherwise, in cases like ``k
, we'd need
to mark it as notMultiLength
or something similar and push that chunk as
content.
Code
// Instead of creating hardcoded variables, we can make the code more extendable
// by storing all the possible tags we'll work with in a Map. Thus, creating
// more tags will not require additional logic in our code.
const tags = new Map(Object.entries({
"*": "strong", // bold
"!": "button", // action
"_": "em", // emphasis
"\uFFFF": "pre", // Just use a very unlikely to happen unicode character,
// We'll replace our multi-length symbols with that one.
}));
// Might be useful if we need to discover the symbol of a tag
const tagSymbols = new Map();
tags.forEach((v, k) => { tagSymbols.set(v, k ); })
const rawMarkdown = `
This must be *bold*,
This also must be *bo_ld*,
this _entire block must be
emphasized even if it's comprised of multiple lines_,
This is an !action! it should be a button,
\`\`\`
beep, boop, this is code
\`\`\`
This is an asterisk\\*
`;
class App extends React.Component {
parseMarkdown(source) {
let currentTag = "";
let currentContent = "";
const parsedMarkdown = [];
// We create this variable to track possible escape characters, eg. "\"
let before = "";
const pushContent = (
content,
tagValue,
props,
) => {
let children = undefined;
// There's the need to parse for empty lines
if (content.indexOf("\n\n") >= 0) {
let before = "";
const contentJSX = [];
let chunk = "";
for (let i = 0; i < content.length; i++) {
if (i !== 0) before = content[i - 1];
chunk += content[i];
if (before === "\n" && content[i] === "\n") {
contentJSX.push(chunk);
contentJSX.push(<br />);
chunk = "";
}
if (chunk !== "" && i === content.length - 1) {
contentJSX.push(chunk);
}
}
children = contentJSX;
} else {
children = [content];
}
parsedMarkdown.push(React.createElement(tagValue, props, children))
};
for (let i = 0; i < source.length; i++) {
const chunk = source[i];
if (i !== 0) {
before = source[i - 1];
}
// Does our current chunk needs to be treated as a escaped char?
const escaped = before === "\\";
// Detect if we need to start/finish parsing our tags
// We are not parsing anything, however, that could change at current
// chunk
if (currentTag === "" && escaped === false) {
// If our tags array has the chunk, this means a markdown tag has
// just been found. We'll change our current state to reflect this.
if (tags.has(chunk)) {
currentTag = tags.get(chunk);
// We have simple content to push
if (currentContent !== "") {
pushContent(currentContent, "span");
}
currentContent = "";
}
} else if (currentTag !== "" && escaped === false) {
// We'll look if we can finish parsing our tag
if (tags.has(chunk)) {
const symbolValue = tags.get(chunk);
// Just because the current chunk is a symbol it doesn't mean we
// can already finish our currentTag.
//
// We'll need to see if the symbol's value corresponds to the
// value of our currentTag. In case it does, we'll finish parsing it.
if (symbolValue === currentTag) {
pushContent(
currentContent,
currentTag,
undefined, // you could pass props here
);
currentTag = "";
currentContent = "";
}
}
}
// Increment our currentContent
//
// Ideally, we don't want our rendered markdown to contain any '\'
// or undesired '*' or '_' or '!'.
//
// Users can still escape '*', '_', '!' by prefixing them with '\'
if (tags.has(chunk) === false || escaped) {
if (chunk !== "\\" || escaped) {
currentContent += chunk;
}
}
// In case an erroneous, i.e. unfinished tag, is present and the we've
// reached the end of our source (rawMarkdown), we want to make sure
// all our currentContent is pushed as a simple string
if (currentContent !== "" && i === source.length - 1) {
pushContent(
currentContent,
"span",
undefined,
);
}
}
return parsedMarkdown;
}
render() {
return (
<div className="App">
<div>{this.parseMarkdown(this.props.rawMarkdown)}</div>
</div>
);
}
}
ReactDOM.render(<App rawMarkdown={rawMarkdown.replace(/```/g, "\uFFFF")} />, document.getElementById('app'));
Link to the code (TypeScript) https://codepen.io/ludanin/pen/GRgNWPv
Link to the code (vanilla/babel) https://codepen.io/ludanin/pen/eYmBvXw