2

I'm trying to write a regular expression to match the input of complex number in the following forms: (With a,b real number, and i/I imaginay unit)

a

a+bi

a-bi

+bi

-bi

a+i

a-i

i

-i

Of course in all numbers I want to be able to also read exponential form (eg: 1.23e+45-67.89e-1256i). I've come up with this:

   regex aplusbi ("(([\\+-]?[\\d]+([\\.][\\d]+)?)?([eE][\\+-]?[\\d]+)?)?(([\\+-])?(([\\d]+?([\\.][\\d]+)?)?([eE]?[\\+-]?[\\d]+)?)?[iI])?")

It gets most of them correct, however when I input +bi or -bi or simply bi the b part goes into the real one, and it also recognises this number as correct:

12.418.546i

getting 12.41 into the real part and 8.546 into the imaginary one. How could I correct it? I'm kind of new to C++ and regexes so any kind of help would be appreciated, thanks!

  • 4
    There's kind of a saying about regexes, that goes something like this: "I have a problem. I solved it with a regex. Now I have *two* problems." Regular expressions are very powerful, but are also very complex and therefore very hard to use and get completely right, and in many cases overkill. I suggest you try a more naive method of parsing first. – Some programmer dude May 19 '18 at 12:35
  • I've tried but I need to handle many inputs ways, and I couldn't figure out this one. Still I'll try to stick to other parsing methods in the future, thanks! –  May 19 '18 at 16:56
  • What is the regex engine you're using ? POSIX or EmcaScript ? –  May 19 '18 at 17:52
  • I'm not too sure, but I think EmcaScript. I'm using the g++ compiler on Linux Ubuntu, with default libraries, that's all I can tell –  May 20 '18 at 12:26

3 Answers3

2

Since you have like real/imaginary clusters, you'd have to introduce 2 assertions
to control it.

Raw: ^(?=[iI.\d+-])([+-]?(?:\d+(?:\.\d*)?|\.\d+)(?:[eE][+-]?\d+)?(?![iI.\d]))?([+-]?(?:(?:\d+(?:\.\d*)?|\.\d+)(?:[eE][+-]?\d+)?)?[iI])?$

Stringed: "^(?=[iI.\\d+-])([+-]?(?:\\d+(?:\\.\\d*)?|\\.\\d+)(?:[eE][+-]?\\d+)?(?![iI.\\d]))?([+-]?(?:(?:\\d+(?:\\.\\d*)?|\\.\\d+)(?:[eE][+-]?\\d+)?)?[iI])?$"

https://regex101.com/r/0JMEZ8/1

Readable version

 ^ 
 (?= [iI.\d+-] )               # Assertion that keeps it from matching empty string
 (                             # (1 start), Real
      [+-]? 
      (?:
           \d+ 
           (?: \. \d* )?
        |  \. \d+ 
      )
      (?: [eE] [+-]? \d+ )?
      (?! [iI.\d] )            # Assertion that separates real/imaginary
 )?                            # (1 end)
 (                             # (2 start), Imaginary
      [+-]? 
      (?:
           (?:
                \d+ 
                (?: \. \d* )?
             |  \. \d+ 
           )
           (?: [eE] [+-]? \d+ )?
      )?
      [iI] 
 )?                            # (2 end)
 $
1

Regex: ^(?:(?<real>\d+(?:(?:\.\d+)?(?:e[+\-]\d+)?)?)?(?:[+\-]))?(?<imaginary>\d+(?:(?:\.\d+)?(?:e[+\-]\d+)?)?)?[iI]$

Demo

Matt.G
  • 3,586
  • 2
  • 10
  • 23
  • Thanks a lot! I've never seen and , I know they're names, but how do they work? –  May 19 '18 at 16:55
  • It doesn't appear to catch integers nor reals –  May 19 '18 at 17:13
  • `` and `` are named capturing groups. You can read more about it (here)[https://www.regular-expressions.info/named.html]. As you could see in the Match Information section in the demo link (on the right side) given in the answer, if you need to capture the real part and the imaginary part, you could use the capturing groups – Matt.G May 19 '18 at 17:33
  • within ur C++ code, you'll need to escape the `\ ` character by adding an extra `\ ` like `^(?:(?\\d+(?:(?:\\.\\d+)?(?:e[+\\-]\\d+)?)?)?(?:[+\\-]))?(?\\d+(?:(?:\\.\\d+)?(?:e[+\\-]\\d+)?)?)?[iI]$` – Matt.G May 19 '18 at 17:35
  • There is one problem, many sites say I cannot name groups in C++, and my compiler complains if I do not remove them –  May 19 '18 at 17:44
  • try without the named groups `^(?:(\\d+(?:(?:\\.\\d+)?(?:e[+\\-]\\d+)?)?)?(?:[+\\-]))?(\\d+(?:(?:\\.\\d+)?(?:e[+\\-]\d+)?)?)?[iI]$`. See [Demo](https://regex101.com/r/6IW0G4/4) – Matt.G May 19 '18 at 17:46
  • 1
    A slight change for capturing imaginary numbers coming from [Go](https://golang.org), which allows spaces before/after the plus sign. I also needed to capture _i_ separately, but the rest is essentially the same as your answer: `(?:(?\d+(?:(?:\.\d+)?(?:e[+\-]\d+)?)?)?(?:\s?[+\-]\s?))?(?\d+(?:(?:\.\d+)?(?:e[+\-]\d+)?)?)?([iI])\s*` – Gwyneth Llewelyn Apr 05 '21 at 12:16
1

So the above was a good start, but I needed some improvements. For instance, Matlab writes out complex numbers in .csv files in the form -0.0232877540359511+-0.00509035792974122i, which could not read because of the leading sign. I also wanted to handle the case where the exponent character is E instead of e, and the exponent has no sign (default positive) so here is a small adjustment to properly handle these additional cases:

^(?:(?<real>[+\-]?\d+(?:(?:\.\d+)?(?:[eE][+\-]?\d+)?)?)?(?:[+\-]))?(?<imaginary>[+\-]?\d+(?:(?:\.\d+)?(?:[eE]?[+\-]\d+)?)?)?[iI]$