1

I am trying to find all floating number (could be in exponential forms with -/+ prefix or not). For example, the following is the valid format: -1.2 +1.2 .2 -3 3E4 -3e5 e-5

The source of text contains several numbers separated with space or comma. I need to use regular expression to tell

  1. tell if there is any invalid number (e.g. 1.2 3.2 s3) s3 is not a valid one
  2. list every single valid number

I have no idea how to get (1) done but for (2), I am using boost::regex and the following code

wstring strre("[-+]?\\b[0-9]*\\.?[0-9]+(?:[eE][-+]?[0-9]+)?\\b");
wstring src("1.2 -3.4 3.2 3 2 1e-3 3e3");
boost::wregex regexp(strre);
boost::match_results<std::wstring::const_iterator> what; 
regex_search(src, what, regexp, boost::match_continuous);
wcout << "RE: " << strre << endl << endl;
wcout << "SOURCE: [" << src << "]" << endl;

for (int i=0; i<what.size(); i++)
  wcout << "OUTPUT: [" << wstring(what[i].first, what[i].second) << "]"<< endl;

But this code only show me the first number (1.2). I also try boost::match_all, boost::match_default, the same result.

ADDITIONAL INFO: Hi all, let's not worry about double backslash issue, it is correctly expressed in my code (because in my testing code, I read the string from a text not by explicit string). Anyway, I modify the code as follow

wstring strre("[-+]?\\b[0-9]*\\.?[0-9]+(?:[eE][-+]?[0-9]+)?\\b");
boost::wregex regexp(strre);
boost::match_results<std::wstring::const_iterator> what; 
wcout << "RE: " << strre << endl << endl;
while (src.length()>0)
{
  wcout << "SOURCE: [" << src << "]" << endl;
  regex_search(src, what, regexp, boost::match_default);
  wcout << "OUTPUT: [" << wstring(what[0].first, what[0].second) << endl;
  src = wstring(what[0].second, src.end());
}

Now, it is correctly show everything single numbers but I have to run regex_search several time due to it only give one number at a time. Well, I just don't understand why regex_search won't give me all results instead. Is that any way to run the search once and get all the results back?

user1285419
  • 2,183
  • 7
  • 48
  • 70

1 Answers1

2

You normally have to double-escape backslash things in a C++ string. So your "\." turns into just .. You would need it to be "\\.", etc. Similarly, your "\b" becomes not a word-boundary but rather a literal backspace! Fix the same way: "\\b".

Also, where’s the doc for that strre class? Are you sure it understands the language you are using?

Apparently the new C++ standard has raw string literals. These work like `backticked` strings in Go, or like 'single-quoted' strings or /patterns/ in Perl. See this answer for details.

EDIT

Here’s a somewhat fancier pattern for detecting floating-point literals, but which uses no backslashes:

 [+-]?(?=[.]?[0-9])[0-9]*(?:[.][0-9]*)?(?:[Ee][+-]?[0-9]+)?

Note that it does require lookaheads, which EREs don’t support. You should probably use the PCRE library, which does. Broken down, that’s

[+-]?                   # optional leading sign
(?=[.]?[0-9])           # lookahead for a digit, maybe with an intervening dot
[0-9]*                  # maybe some digits
(?:[.][0-9]*)?          # maybe a (dot plus maybe some digits)
(?:[Ee][+-]?[0-9]+)?    # maybe an exponent, which may have a sign and must have digits

Pattern courtesy of Perl’s Regexp::Common library.

Community
  • 1
  • 1
tchrist
  • 78,834
  • 30
  • 123
  • 180
  • thanks. But for no-slash version, the code above still can only retrieve the first matching number instead of all of them – user1285419 Mar 24 '12 at 18:22
  • @user1285419 It doesn’t have no slashes; it has no backslashes. Way different! And it will return all of them provided you call it progressively/iteratively. – tchrist Mar 24 '12 at 18:23