Regex for numbers on scientific notation?

Question

I'm loading a .obj file that has lines like

vn 8.67548e-017 1 -1.55211e-016

for the vertex normals. How can I detect them and bring them to double notation?

score 18 · Answer 1 · edited May 23 '17 at 12:01

18

A regex that would work pretty well would be:

-?[\d.]+(?:e-?\d+)?

Converting to a number can be done like this: String in scientific notation C++ to double conversion, I guess.

The regex is

-?      # an optional -
[\d.]+  # a series of digits or dots (see *1)
(?:     # start non capturing group
  e     # "e"
  -?    # an optional -
  \d+   # digits
)?      # end non-capturing group, make optional

**1) This is not 100% correct, technically there can be only one dot, and before it only one (or no) digit. But practically, this should not happen. So the regex is a good approximation and false positives should be very unlikely. Feel free to make the regex more specific.*

edited May 23 '17 at 12:01

Community

1
1

answered Dec 18 '10 at 18:34

Tomalak

332,285
67
532
628

5

I'd suggest breaking the `[\d.]+` up into `\d+\.\d+` to prevent false matches. Use `\d*` if you don't require digits before and/or after the decimal point. – moinudin Dec 18 '10 at 18:38
1

IS that not going to match `5.5.5.5.5.5.5.5.5.5.5.5.5` – Martin York Dec 18 '10 at 18:39
@Martin: Yes, it is. The question is: Is such a value likely to happen? If yes, the regex can easily be made more specific. My guess would be that the `5.5.5.5` scenario is rather unlikely in these circumstances. – Tomalak Dec 18 '10 at 18:43
1

(I understand this is a really old comment, but nonetheless) is there any reason why you wouldn't opt for the more valid option, given it would be fairly trivial to implement? I can't imagine it would take much of a toll on performance, and would be more reliable...? – XtraSimplicity Jan 06 '16 at 22:34
Performance is not even a consideration here. It simply makes no sense to prepare a regex for a case that is not going to happen with valid data. One can't anticipate every form of invalid data anyway, so making the regex more complex "just because" is a waste of time. If 5.5.5.5.5.5 is expected to happen in the input (or if it makes you feel better), by all means, adapt the regex. ;) – Tomalak Jan 06 '16 at 23:25
-?[\d.]+(?:[e|E|d|D]-?\d+)? can match almost all. – MathArt Sep 30 '20 at 11:27

XtraSimplicity · Answer 2 · 2016-01-07T05:54:14.650

6

I tried a number of the other solutions to no avail, so I came up with this.

       ^(-?\d+)\.?\d+(e-|e\+|e|\d+)\d+$

Regular expression visualization

Debuggex Demo

Anything that matches is considered to be valid Scientific Notation.

Please note: This accepts e+, e- and e; if you don't want to accept e, use this: ^(-?\d+)\.?\d+(e-|e\+|\d+)\d+$

I'm not sure if it works for c++, but in c# you can add (?i) between the ^ and (- in the regex, to toggle in-line case-insensitivity. Without it, exponents declared like 1.05E+10 will fail to be recognised.

Edit: My previous regex was a little buggy, so I've replaced it with the one above.

edited Jan 07 '16 at 05:54

answered Jan 06 '16 at 22:46

XtraSimplicity

5,704
1
28
28

I adapted this answer and came up with `^(?:-?\d*)\.?\d+[eE][-\+]?\d+$` -- allows for cases like `.1e5` which are valid in JS – Jacob Jul 13 '20 at 23:05
Why did you end the second capturing group with `\d+`? It makes your regex catch non-scientific notation numbers like `3.1415`. – Paul Razvan Berg May 02 '21 at 10:40

Asha · Answer 3 · 2010-12-18T18:44:41.167

6

You can identify the scientific values using: -?\d*\.?\d+e[+-]?\d+ regex.

edited Dec 18 '10 at 18:44

answered Dec 18 '10 at 18:38

Asha

11,002
6
44
66

1

Never use `{0,1}`—use `?` instead. The former is longer, no clearer, and has an identical effect. – Antal Spector-Zabusky Dec 18 '10 at 18:39
`{0,1}` can be replaced with `?`. But why would you want the decimal point to be optional? And this doesn't allow for negative numbers. It also falsely matches `.0` which is probably not desired. – moinudin Dec 18 '10 at 18:41
@marcog: Probably because according to the example data, the decimal point IS optional. The third field is simply "1". – Ben Voigt Dec 18 '10 at 18:45

score 3 · Accepted Answer · answered Dec 18 '10 at 18:47

3

The standard library function strtod handles the exponential component just fine (so does atof, but strtod allows you to differentiate between a failed parse and parsing the value zero).

answered Dec 18 '10 at 18:47

Ben Voigt

277,958
43
419
720

score 2 · Answer 5 · answered Dec 18 '10 at 19:47

2

If you can be sure that the format of the double is scientific, you can try something like the following:

  string inp("8.67548e-017");
  istringstream str(inp);
  double v;
  str >> scientific >> v;
  cout << "v: " << v << endl;

If you want to detect whether there is a floating point number of that format, then the regexes above will do the trick.

EDIT: the scientific manipulator is actually not needed, when you stream in a double, it will automatically do the handling for you (whether it's fixed or scientific)

answered Dec 18 '10 at 19:47

Nim

33,299
2
62
101

i think this is the way to go for c++. fiddling with regexes that sometimes work and sometimes don't wouldnt be the ideal way for me. instead this delegates the rough part to stl's stringstream. this is the higher level version of checking for a valid scientific format. – Martin Wirth Apr 10 '15 at 20:04

score 0 · Answer 6 · answered Dec 18 '10 at 20:38

Well this is not exactly what you asked for since it isn't Perl (gak) and it is a regular definition not a regular expression, but it's what I use to recognize an extension of C floating point literals (the extension is permitting "_" in digit strings), I'm sure you can convert it to an unreadable regexp if you want:

/* floats: Follows ISO C89, except that we allow underscores */
let decimal_string = digit (underscore? digit) *
let hexadecimal_string = hexdigit (underscore? hexdigit) *

let decimal_fractional_constant =
  decimal_string '.' decimal_string?
  | '.' decimal_string

let hexadecimal_fractional_constant =
  ("0x" |"0X")
  (hexadecimal_string '.' hexadecimal_string?
  | '.' hexadecimal_string)

let decimal_exponent = ('E'|'e') ('+'|'-')? decimal_string
let binary_exponent = ('P'|'p') ('+'|'-')? decimal_string

let floating_suffix = 'L' | 'l' | 'F' | 'f' | 'D' | 'd'
let floating_literal =
  (
    decimal_fractional_constant decimal_exponent? |
    hexadecimal_fractional_constant binary_exponent?
  )
  floating_suffix?

C format is designed for programming languages not data, so it may support things your input does not require.

2b-t · Answer 7 · 2022-05-30T19:34:12.667

For extracting numbers in scientific notation in C++ with std::regex I normally use

((\\+|-)?[[:digit:]]+)(\\.(([[:digit:]]+)?))?((e|E)((\\+|-)?)[[:digit:]]+)?

which corresponds to

((\+|-)?\d+)(\.((\d+)?))?((e|E)((\+|-)?)\d+)?

Regular expression visualization

Debuggex Demo

This will match any number of the form +12.3456e-78 where

the sign can be either + or - and is optional
the comma as well as the positions after the comma are optional
the exponent is optional and can be written with a lower- or upper-case letter

A corresponding code for parsing might look like this:

std::regex const scientific_regex {"((\\+|-)?[[:digit:]]+)(\\.(([[:digit:]]+)?))?((e|E)((\\+|-)?)[[:digit:]]+)?"};
std::string const str {"8.67548e-017 1 -1.55211e-016"};

for (auto it = std::sregex_iterator(str.begin(), str.end(), scientific_regex); it != std::sregex_iterator(); ++it) {
  std::string const match {it->str()};
  std::cout << match << std::endl;
}

If you want to convert the found sub-strings to a double number std::stod should handle the conversion correctly as already pointed out by Ben Voigt.

Try it here!

Regex for numbers on scientific notation?

7 Answers7

Linked