0

I need an RE2 regular expression to validate whether a string (from tokenized text) is a positive number, ideally for any value of the string that would work with, for example, scanf("%f", &num) in C. So that includes exponential floating-point representations like 1e9.

I am using an unpublished framework, one not of my choosing, and I cannot change the framework code. The only method available in the framework to do the required validation is to provide it an RE2 regular expression.

This similar question has several non-comprehensive answers and one really great answer from tchrist that appears to be a PCRE regular expression. Unfortunately it is not RE2-compatible. My tools require RE2.

system PAUSE
  • 37,082
  • 20
  • 62
  • 59
  • You might want a language specific answer as there are different version of regex implemented in different high-level languages. – Richard Critten Mar 16 '23 at 18:06
  • While the reference engine of RE2 (https://github.com/google/re2) is in C++, it is possible to use RE2 in many other languages: C, Erlang, JS, Go, Ruby, and even Perl. This question is not specific to any programming language. RE2 should be pretty much the same in all languages. – system PAUSE Mar 16 '23 at 18:15

1 Answers1

0

This seems to work:

^[+]?(?:(?:0*\.0*[1-9][0-9]*)|(?:0*[1-9][0-9]*(?:\.[0-9]*)?))(?:[Ee][-+]?[0-9]+)?$

It tests the entire string using ^ and $, so it would need modification to work on a substring. Depending on language/environment, \ or other characters may need to be escaped.

Here's a Go program that builds up the regex and tests it:

https://play.golang.com/p/ENHu09Axug4

package main

import (
    "fmt"
    "regexp"
    "strings"
)

// build up regexp string
const (
    optLeadingPlus            string = "[+]?"
    mantissaLessthanOne       string = "(?:0*\\.0*[1-9][0-9]*)"
    mantissaGreaterorequalOne string = "(?:0*[1-9][0-9]*(?:\\.[0-9]*)?)"
    positiveFloatMantissa     string = "(?:" +
        mantissaLessthanOne +
        "|" +
        mantissaGreaterorequalOne +
        ")"
    optExponent         string = "(?:[Ee][-+]?[0-9]+)?"
    regexpPositiveFloat string = "^" + optLeadingPlus + positiveFloatMantissa + optExponent + "$"
)

var (
    rePosFloat *regexp.Regexp
)

// require regexp to compile before main
func init() {
    rePosFloat = regexp.MustCompile(regexpPositiveFloat)
}

// test a single token and show the result
func test(s string, expected bool) {
    r := rePosFloat.MatchString(s)
    msg := "passed:"
    if r != expected {
        msg = "FAILED>"
    }
    fmt.Printf("%v s=%v, match=%v\n", msg, s, r)
}

func main() {
    fmt.Printf("%v\n", regexpPositiveFloat)

    tnan := "5.13.7 +0-1 McGillicuddy"
    tzero := "0 +0 -0 .0 0. 0.0 00.00 0.000 .000 +00 -00 0000. " +
        ".0000000000000000000000000000000000000000000000000000000000000000000 " +
        "0e50 0e-50 0e0 .0E000 +0e-1"
    tneg := "-2 -1 -3.2 -0.001 -.00001 -0.0000001E-99 " +
        "-.00000000000000000000000000000000000000000000000000000000000000000000000001 "
    tpos := "2 +2 2. +1 1. 1.3 +365.2425 .03 0.3 3.0 6.02e23 0.00001 .01 .00001 " +
        ".00000000000000000000000000000000000000000000000000000000000000000000000001 " +
        "+.01 +.000000000000000000000000000000000000001 1e50 1e+50 1e-50 1E0 +.05E-9"

    for _, t := range strings.Split(tnan, " ") {
        test(t, false)
    }
    for _, t := range strings.Split(tzero, " ") {
        test(t, false)
    }
    for _, t := range strings.Split(tneg, " ") {
        test(t, false)
    }
    for _, t := range strings.Split(tpos, " ") {
        test(t, true)
    }
}

Thanks to tchrist both for that PCRE version and for the excellent test cases.

system PAUSE
  • 37,082
  • 20
  • 62
  • 59