Regex pattern for Pilcrow (¶) or Partial Differential (∂) character

Question

I need to find find/replace or convert pilcrow / partial differential characters in a string as they currently show as �.

What I thought would work but doesn't:

const value = 'Javascript Regex pattern for Pilcrow (¶) or Partial Differential (∂) character';
const matches = value.match(/\u2029/gmi);
console.log(matches);

But returns empty.

To be honest, I'm not even sure how to achieve what I need to do.

`/¶/gmi`? (Although it might be easier to fix the character encoding.) — Ivar, Aug 19 '20 at 13:32
@SeanDelaney [this answer](https://stackoverflow.com/a/46436290/4092887) could be useful? — Mauricio Arias Olave, Aug 19 '20 at 13:34
rather than replacing those specifically, could you use something generic to remove everything that isn't in the characters you want to keep? https://stackoverflow.com/questions/20864893/replace-all-non-alpha-numeric-characters-new-lines-and-multiple-white-space-wi — martincarlin87, Aug 19 '20 at 13:35
If it displays as `�` then it's very likely that it isn't properly encoded to begin with. In other words, you won't find ¶ or ∂ because they aren't there. — Álvaro González, Aug 19 '20 at 13:35
Also weird the u2022 is not the second char you are looking for — mplungjan, Aug 19 '20 at 13:35
They aren't the same character. (U+2029) is a PARAGRAPH SEPARATOR, but ¶ (U+00B6) is the PILCROW SIGN. — Wyck, Aug 19 '20 at 13:36

score 5 · Accepted Answer · answered Aug 19 '20 at 14:16

The correct Unicode code points are U+00B6 and U+2202, not U+2029. You'll also want to use a [] character range in your expression:

const value = 'Javascript Regex pattern for Pilcrow (¶) or Partial Differential (∂) character';
const matches = value.match(/[\u00B6\u2202]/gmi);
console.log(matches);

Of course, you don't really need \u escapes in the first place:

const value = 'Javascript Regex pattern for Pilcrow (¶) or Partial Differential (∂) character';
const matches = value.match(/[¶∂]/gmi);
console.log(matches);

Last but not least, you say:

they currently show as �.

If that's the case, it's very likely that it isn't properly encoded to begin with. In other words, you won't find ¶ or ∂ because they aren't there. I suggest you address this first.

Kunal Mukherjee · Answer 2 · 2020-08-19T13:43:46.280

3

Use String.prototype.codePointAt to extract the unicode UTF-16 code point and convert it into hex digits sequence.

const toUnicodeCodePointHex = (str) => {
    const codePoint = str.codePointAt(0).toString(16);
    return '\\u' + '0000'.substring(0, 4 - codePoint.length) + codePoint;
};

const value = 'Javascript Regex pattern for Pilcrow (¶) or Partial Differential (∂) character';

const re = new RegExp(['¶', '∂'].map((item) => toUnicodeCodePointHex(item)).join('|'), 'ig');

const matches = value.match(re);
console.log(matches);

See this very nice article by Mathias Bynens.

edited Aug 19 '20 at 13:43

answered Aug 19 '20 at 13:34

Kunal Mukherjee

5,775
3
25
53

2

Bynens' article is a classic and a recommended reading. But all the [characters involved in this question](https://apps.timwhitlock.info/unicode/inspect?s=%EF%BF%BD%C2%B6%E2%88%82) have a 2-byte encoding in UTF-16. Could you elaborate on what your code tries to accomplish? – Álvaro González Aug 19 '20 at 13:49
@ÁlvaroGonzález Taking the code point at the first position and escaping, as all String prototype's methods work on UTF-16 strings. – Kunal Mukherjee Aug 19 '20 at 14:22

Jordan Stubblefield · Answer 3 · 2020-08-19T14:01:21.710

1

You can find them by hex or octal value:

const matches = value.match(/\u00B6|\u2202/g);

Regex for each:

Pilcrow: \u00B6 or \xB6 or \266

Partial Differential: \u2202

edited Aug 19 '20 at 14:01

answered Aug 19 '20 at 13:55

Jordan Stubblefield

523
4
14

Regex pattern for Pilcrow (¶) or Partial Differential (∂) character

3 Answers3