0

I'm looking to match bolded markdown. Here are some examples:

qwer *asdf* zxcv matches *asdf*

qwer*asdf*zxcv matches *asdf*

qwer \*asdf* zxcv does not match

*qwer* asdf zxcv matches *qwer*

A negative look behind like this (?<!\\)\*(.*)\* works.

Except there is no browser support in Firefox, so I cannot use it.

Similarly, I can get very close with (^|[^\\])\*(.*)\*

The issue is that there are two capture groups, and I need the index of the second capture group, and Javascript only returns the index of the first capture group. I can bandaid it in this case by just adding 1, but in other cases this hack will not work.

My reasoning for doing this is that I'm trying to replace a small subset of Markdown with React components. As an example, I'm trying to convert this string:

qwer *asdf* zxcv *123*

Into this array:

[ "qwer ", <strong>asdf</strong>, " zxcv ", <strong>123</strong> ]

Where the second and fourth elements are created via JSX and included as array elements.

Ryan Peschel
  • 11,087
  • 19
  • 74
  • 136

2 Answers2

1

This should do the trick:

/(?:^|[^\\])(\*[^*]+[^\\]\*)/

The only capturing group there is the string surrounded by *'s.

IceMetalPunk
  • 5,476
  • 3
  • 19
  • 26
  • Ah, didn't know non-capture groups existed like that. Thanks! – Ryan Peschel Dec 05 '19 at 19:30
  • Yep! If you start a group with `?:` it makes it non-capturing. – IceMetalPunk Dec 05 '19 at 19:32
  • Ah darn, there's still the original problem I had with my regex in the OP in that the `index` of the JavaScript match result refers to the full match, which includes the space. So it's still off-by-one. – Ryan Peschel Dec 05 '19 at 19:34
  • What are you using the index for? The `match` function will return the matching capture group for you. Perhaps there's a better way to do whatever you're doing with that index. *EDIT* If you really need the index, you can make that first group capturing by removing the `?:` and then take the index plus the length of the first capture. – IceMetalPunk Dec 05 '19 at 19:35
  • I'm trying to replace that item in the string with a React component. An alternative approach I suppose could be something like `(.*)(?:^|[^\\])(\*[^*]+[^\\]\*)(.*)`, and then I can do `[ group(1), group(2), group(3) ]`, but that one still has the problem where it's missing that one character so it won't reconstruct properly. – Ryan Peschel Dec 05 '19 at 19:37
  • `/(^|[^\\])(\*[^*]+[^\\]\*)/ ` -- group 1 (match index 0) will contain the character before the bold asterisk or an empty string if the string starts with the bold asterisk; group 2 (match index 1) will contain just the bold part. You can combine them as needed, or grab the length of the preceding character for index math. – IceMetalPunk Dec 05 '19 at 19:39
1

You will also need to take into account that when a backslash occurs before an asterisk, it may be one that is itself escaped by a backslash, and in that case the asterisk should be considered the start of bold markup. Except if that one is also preceded by a backslash,...etc.

So I would suggest this regular expression:

((?:^|[^\\])(?:\\.)*)\*((\\.|[^*])*)\*

If the purpose is to replace these with tags, like <strong> ... </strong>, then just use JavaScript's replace as follows:

let s = String.raw`now *this is bold*, and \\*this too\\*, but \\\*this\* not`;
console.log(s);

let regex = /((?:^|[^\\])(?:\\.)*)\*((\\.|[^*])*)\*/g;
let res = s.replace(regex, "$1<strong>$2</strong>");
console.log(res);

If the bolded words should be converted to a React component and stored in an array with the other pieces of plain text, then you could use split and map:

let s = String.raw`now *this is bold*, and \\*this too\\*, but \\\*this\* not`;
console.log(s);

let regex = /((?:^|[^\\])(?:\\.)*)\*((?:\\.|[^*])*)\*/g;
let res = s.split(regex).map((s, i) =>
    i%3 === 2 ? React.createComponent("strong", {}, s) : s
);

Since there are two capture groups in the "delimiter" for the split call, one having the preceding character(s) and the second the word itself, every third item in the split result is a word to be bolded, hence the i%3 expression.

trincot
  • 317,000
  • 35
  • 244
  • 286
  • Unfortunately I cannot use string replace, because the items have to be real JSX items in an array. This is because I replace some markup with custom React components. – Ryan Peschel Dec 05 '19 at 20:10
  • `replace` takes a callback as second argument in which you can inject whatever you want. – trincot Dec 05 '19 at 20:14
  • Yeah but the return value of the callback function has to be a string, no? As far as I know I can't inject literal React components or arrays as replacements for the string. – Ryan Peschel Dec 05 '19 at 20:19
  • If you edit your question and show your react code as you currently have it, I can possibly propose how to align it with this. – trincot Dec 05 '19 at 20:21
  • Sure, I'll do that now. One moment – Ryan Peschel Dec 05 '19 at 20:22
  • See addition to answer. – trincot Dec 05 '19 at 20:35
  • This is insanely impressive! Thank you!! – Ryan Peschel Dec 05 '19 at 20:42
  • The only issue is that I also want to then run an italics regex on it (after I run the bold one), but now it is no longer a string.. But I'll try to figure something out. – Ryan Peschel Dec 05 '19 at 20:45
  • That's going to be tricky, since that could also be nested inside a bold markup. But I suggest you have a go at it, and if you bump into an issue, make it a new question. – trincot Dec 05 '19 at 20:47
  • Okie dokie I'll work on it for another 30 minutes or so and if I'm still stuck I'll make another question. – Ryan Peschel Dec 05 '19 at 20:50
  • I posted a new question here: https://stackoverflow.com/questions/59203236/how-to-parse-a-small-subset-of-markdown-into-react-components – Ryan Peschel Dec 05 '19 at 21:04