8

I need to find find/replace or convert pilcrow / partial differential characters in a string as they currently show as �.

What I thought would work but doesn't:

const value = 'Javascript Regex pattern for Pilcrow (¶) or Partial Differential (∂) character';
const matches = value.match(/\u2029/gmi);
console.log(matches);

But returns empty.

To be honest, I'm not even sure how to achieve what I need to do.

hgb123
  • 13,869
  • 3
  • 20
  • 38
Sean Delaney
  • 328
  • 6
  • 21

3 Answers3

5

The correct Unicode code points are U+00B6 and U+2202, not U+2029. You'll also want to use a [] character range in your expression:

const value = 'Javascript Regex pattern for Pilcrow (¶) or Partial Differential (∂) character';
const matches = value.match(/[\u00B6\u2202]/gmi);
console.log(matches);

Of course, you don't really need \u escapes in the first place:

const value = 'Javascript Regex pattern for Pilcrow (¶) or Partial Differential (∂) character';
const matches = value.match(/[¶∂]/gmi);
console.log(matches);

Last but not least, you say:

they currently show as �.

If that's the case, it's very likely that it isn't properly encoded to begin with. In other words, you won't find or because they aren't there. I suggest you address this first.

Álvaro González
  • 142,137
  • 41
  • 261
  • 360
3

Use String.prototype.codePointAt to extract the unicode UTF-16 code point and convert it into hex digits sequence.

const toUnicodeCodePointHex = (str) => {
    const codePoint = str.codePointAt(0).toString(16);
    return '\\u' + '0000'.substring(0, 4 - codePoint.length) + codePoint;
};

const value = 'Javascript Regex pattern for Pilcrow (¶) or Partial Differential (∂) character';

const re = new RegExp(['¶', '∂'].map((item) => toUnicodeCodePointHex(item)).join('|'), 'ig');

const matches = value.match(re);
console.log(matches);

See this very nice article by Mathias Bynens.

Kunal Mukherjee
  • 5,775
  • 3
  • 25
  • 53
  • 2
    Bynens' article is a classic and a recommended reading. But all the [characters involved in this question](https://apps.timwhitlock.info/unicode/inspect?s=%EF%BF%BD%C2%B6%E2%88%82) have a 2-byte encoding in UTF-16. Could you elaborate on what your code tries to accomplish? – Álvaro González Aug 19 '20 at 13:49
  • @ÁlvaroGonzález Taking the code point at the first position and escaping, as all String prototype's methods work on UTF-16 strings. – Kunal Mukherjee Aug 19 '20 at 14:22
1

You can find them by hex or octal value:

const matches = value.match(/\u00B6|\u2202/g);

Regex for each:

Pilcrow: \u00B6 or \xB6 or \266

Partial Differential: \u2202