46

I want to be able to write a function which receives a number in scientific notation as a string and splits out of it the coefficient and the exponent as separate items. I could just use a regular expression, but the incoming number may not be normalised and I'd prefer to be able to normalise and then break the parts out.

A colleague has got part way of an solution using VB6 but it's not quite there, as the transcript below shows.

cliVe> a = 1e6
cliVe> ? "coeff: " & o.spt(a) & " exponent: " & o.ept(a)
coeff: 10 exponent: 5 

should have been 1 and 6

cliVe> a = 1.1e6
cliVe> ? "coeff: " & o.spt(a) & " exponent: " & o.ept(a)
coeff: 1.1 exponent: 6

correct

cliVe> a = 123345.6e-7
cliVe> ? "coeff: " & o.spt(a) & " exponent: " & o.ept(a)
coeff: 1.233456 exponent: -2

correct

cliVe> a = -123345.6e-7
cliVe> ? "coeff: " & o.spt(a) & " exponent: " & o.ept(a)
coeff: 1.233456 exponent: -2

should be -1.233456 and -2

cliVe> a = -123345.6e+7
cliVe> ? "coeff: " & o.spt(a) & " exponent: " & o.ept(a)
coeff: 1.233456 exponent: 12

correct

Any ideas? By the way, Clive is a CLI based on VBScript and can be found on my weblog.

Chad Birch
  • 73,098
  • 23
  • 151
  • 149
bugmagnet
  • 7,631
  • 8
  • 69
  • 131

4 Answers4

85

Google on "scientific notation regexp" shows a number of matches, including this one (don't use it!!!!) which uses

*** warning: questionable ***
/[-+]?[0-9]*\.?[0-9]+([eE][-+]?[0-9]+)?/

which includes cases such as -.5e7 and +00000e33 (both of which you may not want to allow).

Instead, I would highly recommend you use the syntax on Doug Crockford's JSON website which explicitly documents what constitutes a number in JSON. Here's the corresponding syntax diagram taken from that page:

alt text
(source: json.org)

If you look at line 456 of his json2.js script (safe conversion to/from JSON in javascript), you'll see this portion of a regexp:

/-?\d+(?:\.\d*)?(?:[eE][+\-]?\d+)?/

which, ironically, doesn't match his syntax diagram.... (looks like I should file a bug) I believe a regexp that does implement that syntax diagram is this one:

/-?(?:0|[1-9]\d*)(?:\.\d+)?(?:[eE][+\-]?\d+)?/

and if you want to allow an initial + as well, you get:

/[+\-]?(?:0|[1-9]\d*)(?:\.\d+)?(?:[eE][+\-]?\d+)?/

Add capturing parentheses to your liking.

I would also highly recommend you flesh out a bunch of test cases, to ensure you include those possibilities you want to include (or not include), such as:

allowed:
+3
3.2e23
-4.70e+9
-.2E-4
-7.6603

not allowed:
+0003   (leading zeros)
37.e88  (dot before the e)

Good luck!

Jason S
  • 184,598
  • 164
  • 608
  • 970
  • 1
    ...? Just use the regexp/diagram shown in the JSON website. – Jason S Jun 13 '14 at 19:34
  • Then why don't you try the previous regex, the one before the statement "and if you want to allow an initial + as well"? – Jason S Jun 13 '14 at 23:17
  • I know this is a very old forum but wanted to point something out. It looks like your pattern allows for this type of entry 'e324ewfg' which obviously is not a valid number. – Marin Petkov Jan 28 '17 at 08:56
  • 1
    the regexps posted do not include `^` at the beginning or `$` at the end which would prevent those, and should be used if the match is *only* a number; but some uses of regexps are in larger patterns. – Jason S Jan 28 '17 at 17:04
  • haha... arg... i thought this would be simpler. this is for the most general case tho. – Trevor Boyd Smith Apr 10 '19 at 11:35
  • Nice - However it *does* allow for the . before the e, or even ending on a . The \d* after the . needs to be a \d+. Basically if there is a . there has to be a digit. – Gerard ONeill Jun 12 '21 at 01:10
  • 2
    @GerardONeill huh -- you are correct; I wonder why it took 12 years for someone to catch my mistake in transcribing the syntax diagram into regexp notation :-) – Jason S Jun 16 '21 at 20:08
5

Building off of the highest rated answer, I modified the regex slightly to be /^[+\-]?(?=.)(?:0|[1-9]\d*)?(?:\.\d*)?(?:\d[eE][+\-]?\d+)?$/.

The benefits this provides are:

  1. allows matching numbers like .9 (I made the (?:0|[1-9]\d*) optional with ?)
  2. prevents matching just the operator at the beginning and prevents matching zero-length strings (uses lookahead, (?=.))
  3. prevents matching e9 because it requires the \d before the scientific notation

My goal in this is to use it for capturing significant figures and doing significant math. So I'm also going to slice it up with capturing groups like so: /^[+\-]?(?=.)(0|[1-9]\d*)?(\.\d*)?(?:(\d)[eE][+\-]?\d+)?$/.

An explanation of how to get significant figures from this:

  1. The entire capture is the number you can hand to parseFloat()
  2. Matches 1-3 will show up as undefined or strings, so combining them (replace undefined's with '') should give the original number from which significant figures can be extracted.

This regex also prevents matching left-padded zeros, which JavaScript sometimes accepts but which I have seen cause issues and which adds nothing to significant figures, so I see preventing left-padded zeros as a benefit (especially in forms). However, I'm sure the regex could be modified to gobble up left-padded zeros.

Another problem I see with this regex is it won't match 90.e9 or other such numbers. However, I find this or similar matches highly unlikely as it is the convention in scientific notation to avoid such numbers. Though you can enter it in JavaScript, you can just as easily enter 9.0e10 and achieve the same significant figures.

UPDATE

In my testing, I also caught the error that it could match '.'. So the look-ahead should be modified to (?=\.\d|\d) which leads to the final regex:

/^[+\-]?(?=\.\d|\d)(?:0|[1-9]\d*)?(?:\.\d*)?(?:\d[eE][+\-]?\d+)?$/
Troy Weber
  • 897
  • 8
  • 8
3

Building on @Troy Weber, I would suggest

/^[+\-]?(?=\.\d|\d)(?:0|[1-9]\d*)?(?:\.\d+)?(?:(?<=\d)(?:[eE][+\-]?\d+))?$/

to avoid matching 3., per @Jason S's rules

dreakor
  • 31
  • 1
2

Here is some Perl code I just hacked together quickly.

my($sign,$coeffl,$coeffr,$exp) = $str =~ /^\s*([-+])?(\d+)(\.\d*)?e([-+]?\d+)\s*$/;

my $shift = length $coeffl;
$shift = 0 if $shift == 1;

my $coeff =
  substr( $coeffl, 0, 1 );

if( $shift || $coeffr ){
  $coeff .=
    '.'.
    substr( $coeffl, 1 );
}

$coeff .= substr( $coeffr, 1 ) if $coeffr;

$coeff = $sign . $coeff if $sign;

$exp += $shift;

say "coeff: $coeff exponent: $exp";
Brad Gilbert
  • 33,846
  • 11
  • 78
  • 129