Regex to replace all superscript numbers

Question

I'm struggling to figure out a reasonable solution to this. I need to replace the following characters: ⁰¹²³⁴⁵⁶⁷⁸⁹ using a regex replace. I would think that you would just do this:

item = item.replace(/[⁰¹²³⁴⁵⁶⁷⁸⁹]/g, '');

However, when I try to do that, notepad++ converts symbols 5-9 into regular script numbers. I realize this probably relates to the encoding format I am using, which I see is set to ANSI.

I've never really understood the difference between the various encoding formats. But I'm wondering if there is any easy fix for this issue?

Also, you have to wrap that up `/[⁰¹²³⁴⁵⁶⁷⁸⁹]/g` properly, you're missing the starting bracket — adeneo, Mar 13 '16 at 23:01
You really have to know the difference between the various character encodings. It is *essential.* This should help start your journey. http://kunststube.net/encoding/ — Jeremy J Starcher, Mar 13 '16 at 23:01
Works just fine if you correct the regex *(and jsFiddle is using UTF8)* -> **https://jsfiddle.net/x010mpdp/** — adeneo, Mar 13 '16 at 23:15
You could try ECMAScript 2015 [*unicode escape sequences*](https://mathiasbynens.be/notes/es6-unicode-regex), but support might be lacking… — RobG, Mar 13 '16 at 23:22

Richard Hamilton · Answer 1 · 2016-03-13T23:34:41.520

6

Here is the simple regex for finding all superscript numbers

/\p{No}/gu/

Breakdown:

\p{No} matches a superscript or subscript digit, or a number that is not a digit [0-9]
u modifier: unicode: Pattern strings are treated as UTF-16. Also causes escape sequences to match unicode characters
g modifier: global. All matches (don't return on first match)

https://regex101.com/r/zA8sJ4/1

Now, most modern browsers still have no built in support for unicode numbers in regex. I would recommend using the xregexp library

XRegExp provides augmented (and extensible) JavaScript regular expressions. You get new modern syntax and flags beyond what browsers support natively. XRegExp is also a regex utility belt with tools to make your client-side grepping and parsing easier, while freeing you from worrying about pesky aspects of JavaScript regexes like cross-browser inconsistencies or manually manipulating lastIndex.

http://xregexp.com/

HTML Solution

HTML has a <sup> tag for representing superscript text.

The tag defines superscript text. Superscript text appears half a character above the normal line, and is sometimes rendered in a smaller font. Superscript text can be used for footnotes, like WWW[1].

If there are superscript numbers, the html markup almost surely has the sup tag.

var math = document.getElementById("math");

math.innerHTML = math.innerHTML.replace(/<sup>[\d]?<\/sup>/g, "");

<p id="math">4<sup>2</sup>+ 3<sup>2</sup></p>

edited Mar 13 '16 at 23:34

answered Mar 13 '16 at 22:59

Richard Hamilton

25,478
10
60
87

I don't think that's a valid regex in javascript, the unicode flag is not supported – adeneo Mar 13 '16 at 23:19
@adeneo—Unicode escape sequences (and the u flag) are supported in ECMAScript 2015, however not many browsers seem to have implemented them yet. – RobG Mar 13 '16 at 23:25
`\p{No}` also matches around [600 other characters](http://unicode.org/cldr/utility/list-unicodeset.jsp?a=[:General_Category=Other_Number:]) that aren't superscript numbers. – 一二三 Mar 13 '16 at 23:28
@RobG - indeed, didn't know that. I can find it in the spec, but not much about browser support, seems it's not really supported anywhere yet. However, the OP's regex works just fine. – adeneo Mar 13 '16 at 23:29
1

[You can't parse (X)HTML with regex.](http://stackoverflow.com/a/1732454/1529630) – Oriol Mar 13 '16 at 23:37
your regex is not supported by javascript. – Saleem Mar 13 '16 at 23:48
Even using `\p{No}` the other number property, in Unicode 11 that matches 807 code points of which the 10 superscript code points are a subset. So, you wouldn't use this to find superscript, it matches too much. – Dec 07 '18 at 18:53

score 3 · Answer 2 · answered Mar 13 '16 at 23:47

3

Use UTF-8. If for some reason you can't, a workaround is escaping

var rg = new RegExp(
  "[\u2070\u00b9\u00b2\u00b3\u2074\u2075\u2076\u2077\u2078\u2079]",
  "g"
);

answered Mar 13 '16 at 23:47

Oriol

274,082
63
437
513

score 2 · Answer 3 · answered Mar 14 '16 at 00:31

I'd suggest trying following regex:

/[\u2070-\u209f\u00b0-\u00be]+/g

Code will look like

var re = /[\u2070-\u209f\u00b0-\u00be]+/g; 
var str = '⁰¹²³⁴⁵⁶⁷⁸⁹';
var subst = ''; 

var result = str.replace(re, subs);

result will contain after successful run:

2sometext

See demo here

Regex to replace all superscript numbers

3 Answers3

Linked

Related