2

I am looking for a Regex for non decimal integer considering exponents and honestly I have tried a lot before asking here.

The regex should

  • match 1.23E4,1.2334576E34, 122E3,123,456 etc.
  • not match 1.234E2 (since it expands to 123.4).
  • should not match 1.22 and so on.

My try was

^[+-]?([0-9]*\\.?[0-9]+|[0-9]+\\.?[0-9]*)([eE][+]?[0-9]+)?$

However as you can see I am not calculating the exponent so that after expansion I should be able to tell that a value X after expanding does not contain a decimal.

Is there any way to extract the number of digits after the decimal . and compare it with exponent so that I can be sure that after expanding it will not contain a decimal.

For the info only a regex that can work in runtime will work for me.

Please help me guys...

Ro Yo Mi
  • 14,790
  • 5
  • 35
  • 43
Akhtar
  • 93
  • 1
  • 9
  • 3
    What language? Different languages have different regex angines. – Madara's Ghost Jul 29 '13 at 17:14
  • 5
    You can't do that without a callback function which compare the exponant and the number of decimal. – Casimir et Hippolyte Jul 29 '13 at 17:14
  • 3
    Regular expressions can't do arithmetic comparisons. You should do that in the calling application. – Barmar Jul 29 '13 at 17:15
  • a better question is why you would *want* to do this with a regex? talk about hard to debug and maintain (even if you could). – user428517 Jul 29 '13 at 17:51
  • 3
    This is much easier to solve without regex than it is with regex. Is there any reason you _must_ use regex? – Shaz Jul 29 '13 at 17:55
  • 2
    you can do this (for some arbitrary but fixed upper limit to the number digits to the left of E) but as RyanWH says it is much easier to use a different approach. are you sure you cannot retrieve all values and then filter ones that are not integers in a separate step, for example? – andrew cooke Jul 29 '13 at 18:16
  • A reason why one might want to do this in regex is for validation of input within some framework that only allows the user to specify a single regex as the validation criterion. Still, as @MadaraUchiha said, the environment would be very useful (in any regex question). – Martin Ender Jul 29 '13 at 21:24

2 Answers2

0

ok, so this is only if you really need this for some weird regexp-only validation. it's written in python 3 and it makes no attempt to be compact (there's no limitation except available memory in the size of a regexp in python).

def over(n):
    '''make aregexp for an exponent of n or more'''
    assert n < 100
    return r'([1-9]\d{2,}|%s)' % '|'.join(str(i) for i in range(n, 100))

def make_decimal(n_digits, n_decimal):
    '''make a regexp for a number with an "E" with the given number of significant digits and decimal places'''
    assert n_decimal < n_digits
    assert 100 > n_decimal >= 0
    if n_decimal:
        return r'\d{%d}.\d{%d}E%s' % (n_digits-n_decimal, n_decimal, over(n_decimal))
    else:
        return r'\d{%d}E\d+'

def make_e(n_digits):
    '''make a regexp for an integer with an "E" with the given number of significant digits'''
    return '|'.join(make_decimal(n_digits, i) for i in range(n_digits))

def make_regexp(max_digits):
    '''make a regexp for a decimal integer with up to the given number of significant digits'''
    assert max_digits < 100
    return r'(\d+|%s)' % '|'.join(make_e(i) for i in range(max_digits+1))

here's some test code.

from re import compile

rx = make_regexp(8)
m = compile('^%s$' % rx)
for n in ['1.23E4', '1.2334576E34', '122E3', '123', '456']:
    assert m.match(n), n
for n in ['1.234E2', '1.22']:
    assert not m.match(n), n

for up to significant 8 digits (to the left of E), which seems a reasonable limit, the regexp generated is 8774 digits long. you could reduce this significantly (for example, see https://stackoverflow.com/a/17840228/181772), but what's the need (the regular expression engine is capable of generating a much smaller internal automaton from this)?

Community
  • 1
  • 1
andrew cooke
  • 45,717
  • 10
  • 93
  • 143
0

Description

It's not impossible, but rather difficult and the expression will really start to get out of hand. Take this 2831 character monster which:

  • validates a number with exponent will expand to an integer
  • requires a number to be in 123.456e7890 or 1234.678e1,234,567
  • if the exponent contains commas they must appear in the correct comma delimited three digit groupings
  • supports only numbers upto 99 places after the decimal point

As written here it does require the use of the x option which will ignore white space and comments. The expression could be shortened to about 2041 by replacing the [eE] with e and using the i option; and [0-9] with \d however this will slightly reduce performance because \d class contains all unicode characters and not just 0-9.

^
(?=.*?[eE][0-9]{1,3}(?:,[0-9]{3})*|[0-9]*$)  # validate commas are in the correct order
(?=[0-9]+\.   # match the integer portion of a real number
(?=
[0-9]{1,99}[eE][1-9](?:,?[0-9]){2,}

|[0-9]{1,9}[eE][1-9],?[0-9]
|[0-9]{10,19}[eE][2-9],?[0-9]
|[0-9]{20,29}[eE][3-9],?[0-9]
|[0-9]{30,39}[eE][4-9],?[0-9]
|[0-9]{40,49}[eE][5-9],?[0-9]
|[0-9]{50,59}[eE][6-9],?[0-9]
|[0-9]{60,69}[eE][7-9],?[0-9]
|[0-9]{70,79}[eE][89],?[0-9]
|[0-9]{80,89}[eE][9],?[0-9]
|[0-9]{90,99}[eE][1-9],?[0-9]

|(?=[0-9]{90}(?=.*?[eE]9)(?:[eE].,?[0-9]|[0-9]{1}[eE].,?[1-9]|[0-9]{2}[eE].,?[2-9]|[0-9]{3}[eE].,?[3-9]|[0-9]{4}[eE].,?[4-9]|[0-9]{5}[eE].,?[5-9]|[0-9]{6}[eE].,?[6-9]|[0-9]{7}[eE].,?[7-9]|[0-9]{8}[eE].,?[89]|[0-9]{9}[eE].,?9))
|(?=[0-9]{80}(?=.*?[eE]8)(?:[eE].,?[0-9]|[0-9]{1}[eE].,?[1-9]|[0-9]{2}[eE].,?[2-9]|[0-9]{3}[eE].,?[3-9]|[0-9]{4}[eE].,?[4-9]|[0-9]{5}[eE].,?[5-9]|[0-9]{6}[eE].,?[6-9]|[0-9]{7}[eE].,?[7-9]|[0-9]{8}[eE].,?[89]|[0-9]{9}[eE].,?9))
|(?=[0-9]{70}(?=.*?[eE]7)(?:[eE].,?[0-9]|[0-9]{1}[eE].,?[1-9]|[0-9]{2}[eE].,?[2-9]|[0-9]{3}[eE].,?[3-9]|[0-9]{4}[eE].,?[4-9]|[0-9]{5}[eE].,?[5-9]|[0-9]{6}[eE].,?[6-9]|[0-9]{7}[eE].,?[7-9]|[0-9]{8}[eE].,?[89]|[0-9]{9}[eE].,?9))
|(?=[0-9]{60}(?=.*?[eE]6)(?:[eE].,?[0-9]|[0-9]{1}[eE].,?[1-9]|[0-9]{2}[eE].,?[2-9]|[0-9]{3}[eE].,?[3-9]|[0-9]{4}[eE].,?[4-9]|[0-9]{5}[eE].,?[5-9]|[0-9]{6}[eE].,?[6-9]|[0-9]{7}[eE].,?[7-9]|[0-9]{8}[eE].,?[89]|[0-9]{9}[eE].,?9))
|(?=[0-9]{50}(?=.*?[eE]5)(?:[eE].,?[0-9]|[0-9]{1}[eE].,?[1-9]|[0-9]{2}[eE].,?[2-9]|[0-9]{3}[eE].,?[3-9]|[0-9]{4}[eE].,?[4-9]|[0-9]{5}[eE].,?[5-9]|[0-9]{6}[eE].,?[6-9]|[0-9]{7}[eE].,?[7-9]|[0-9]{8}[eE].,?[89]|[0-9]{9}[eE].,?9))
|(?=[0-9]{40}(?=.*?[eE]4)(?:[eE].,?[0-9]|[0-9]{1}[eE].,?[1-9]|[0-9]{2}[eE].,?[2-9]|[0-9]{3}[eE].,?[3-9]|[0-9]{4}[eE].,?[4-9]|[0-9]{5}[eE].,?[5-9]|[0-9]{6}[eE].,?[6-9]|[0-9]{7}[eE].,?[7-9]|[0-9]{8}[eE].,?[89]|[0-9]{9}[eE].,?9))
|(?=[0-9]{30}(?=.*?[eE]3)(?:[eE].,?[0-9]|[0-9]{1}[eE].,?[1-9]|[0-9]{2}[eE].,?[2-9]|[0-9]{3}[eE].,?[3-9]|[0-9]{4}[eE].,?[4-9]|[0-9]{5}[eE].,?[5-9]|[0-9]{6}[eE].,?[6-9]|[0-9]{7}[eE].,?[7-9]|[0-9]{8}[eE].,?[89]|[0-9]{9}[eE].,?9))
|(?=[0-9]{20}(?=.*?[eE]2)(?:[eE].,?[0-9]|[0-9]{1}[eE].,?[1-9]|[0-9]{2}[eE].,?[2-9]|[0-9]{3}[eE].,?[3-9]|[0-9]{4}[eE].,?[4-9]|[0-9]{5}[eE].,?[5-9]|[0-9]{6}[eE].,?[6-9]|[0-9]{7}[eE].,?[7-9]|[0-9]{8}[eE].,?[89]|[0-9]{9}[eE].,?9))
|(?=[0-9]{10}(?=.*?[eE]1)(?:[eE].,?[0-9]|[0-9]{1}[eE].,?[1-9]|[0-9]{2}[eE].,?[2-9]|[0-9]{3}[eE].,?[3-9]|[0-9]{4}[eE].,?[4-9]|[0-9]{5}[eE].,?[5-9]|[0-9]{6}[eE].,?[6-9]|[0-9]{7}[eE].,?[7-9]|[0-9]{8}[eE].,?[89]|[0-9]{9}[eE].,?9))

|(?:[eE][0-9]|[0-9]{1}[eE][1-9]|[0-9]{2}[eE][2-9]|[0-9]{3}[eE][3-9]|[0-9]{4}[eE][4-9]|[0-9]{5}[eE][5-9]|[0-9]{6}[eE][6-9]|[0-9]{7}[eE][7-9]|[0-9]{8}[eE][89]|[0-9]{9}[eE]9)
)
|(?=[0-9]+[eE])   # integers
)
[+-]?
([0-9]*\.?[0-9]+|[0-9]+\.?[0-9]*)
[eE][+]?((?:,?[0-9]+)+)

As written here the expression uses the x option which ignores white space

Example

Sample Text

1.2334576E34
1.23E4
1.2334576E34
122E3,123,456
1.234
1.234E2

Matches

[0] => 1.2334576E34
[1] => 1.23E4
[2] => 1.2334576E34
[3] => 122E3,123,456

enter image description here

Community
  • 1
  • 1
Ro Yo Mi
  • 14,790
  • 5
  • 35
  • 43