1

As part of a personal project to calculate the Jordan normal form of square matrices, it turns out I need to parse polynomials with complex coefficients in order to ease a lot of the code.

(relevant code at the bottom of the post)

The polynomials I wish to parse are of the following form:

  1. A coefficient may be real, imaginary or complex.
  2. If the coefficient is complex, it would be wrapped with parentheses. If these parentheses are the leading coefficient, they won't be preceded by a + or -.
  3. If the coefficient is real, imaginary, or complex whose real and\or imaginary components are of magnitude 1, the 1 won't appear but only the sign.
  4. Parentheses are to be preceded by + only.
  5. The variable x may have a power (>2), may have a power of 1 and then it appears just as x, or may not appear at all.
  6. There are no more rules concerning the text representation of the polynomial, i.e. powers aren't necessarily ordered in ascending\descending order.

Some examples of properly formatted polynomials:

  • 1
  • -1
  • -2.1x
  • 3i
  • x^2-1
  • -x^3+2x+1
  • (5-5i)x^2-x-1
  • (-1+i)x-5
  • -ix^3-x^2+1

..and some ill-formatted ones:

  • 1x (leading unnecessary 1)
  • +(+1-2i)x (parentheses have a leading +, real component has leading +)
  • (5.1i)x^2 (no need for parenthesis since coefficient is imaginary)
  • -(i-1) (complex coefficient has leading -)

After some reading online (SO, Java tutorials, Java API), I quickly came to the conclusion that a regex would be the easiest approach for parsing, considering all the restraints mentioned above. On the formal side, a regex for this task is possible, since I drew an NFA the accepts only such valid expressions.

I'm doing this TDD (via JUnit 4), and this test fails:

assertEquals("Polynomial parsed incorrectly.", poly07, PolyParser.parse(exp07));

where poly07 looks like this: (5-5i)x^2-x-1.

This is the exception that's being raised:

java.lang.NumberFormatException: For input string: "5-5"
at sun.misc.FloatingDecimal.readJavaFormatString(FloatingDecimal.java:2043)
at sun.misc.FloatingDecimal.parseDouble(FloatingDecimal.java:110)
at java.lang.Double.parseDouble(Double.java:538)
at PolyParser.parse(PolyParser.java:55)
at PolyParserTest.testParse(PolyParserTest.java:59)

I've tried debugging, and saw that the regex captures 5-5i (and later strips the i). It then tries to invoke Double.parseDouble with the argument string 5-5, which causes the exception.

After all the reading, I can't quite figure out what's the required adjustment in the regex in order for this whole show to work. Also, the regex isn't ordered like the representations constraints mentioned above, because I want to see if the coefficient is complex before trying to parse it as real; also encountered problems that real numbers (i.e. with decimal point) are parsed as integers, so that's why the regex treats real numbers first.

The regex:

public static final String POLYNOMIAL_REGEX =
        "([+-])?" +                     // leading plus or minus
        "(\\()?" +                      // parenthesis to denote the beginning of a complex number
        "([+-])?(((\\d+.\\d+)|\\d+)i)?" +      // component of coefficient, imaginary
        "(((-)?\\d+.\\d+)|\\d+)?" +     // component of coefficient, real
        "(\\))?" +                      // parenthesis to denote the end of a complex number
        "(x)?" +                        // variable
        "(?:\\^(\\d+))?";               // power of the variable

I'm not going to post all of the relevant code here because it would clutter stuff up. All of the code is on GitHub, just be sure to switch to the branch PolyParser.

Relevant code is in the files:

  1. PolyParser.java
  2. Polynomial.java
  3. Complex.java

The test unit is in the file PolyParserTest.java.

asafc
  • 357
  • 3
  • 21
  • regular expressions are almost never the right solution for parsing problems. – Henry Sep 29 '15 at 12:49
  • 1
    My advice: don't. Use a context-free parser, if you want a simple one, take a look at [PEP](http://www.coffeeblack.org/). Or you can go the whole hog and use something like Antlr. – biziclop Sep 29 '15 at 12:55
  • 1
    Alternatively, you can try [JEP](http://www.cse.msu.edu/SENS/Software/jep-2.23/doc/website/index.html), which does all this out of the box and you can even evaluate the polynomials straight away. – biziclop Sep 29 '15 at 13:01
  • First you figure out what you need to do. And then you pick the necessary tools to do it. In this case you first picked out the tool... and it's the wrong one – Cedric Mamo Sep 29 '15 at 14:38

1 Answers1

1

Regexes basically cannot parse expressions because they can't keep track of nesting (e.g., parentheses). This is a lesson most people don't know, and they discover it the hard way.

However, expressions are rather easy to parse, using top-down parsing. See my answer on how to do this: https://stackoverflow.com/a/2336769/120163 This answer covers how to do just parsing, and is linked to another answer that talks about how to build an AST to represent your expression.

A first step: write a grammar representing what you expressions will allow. You have an ad hoc description in your question, but a grammar will force you write exactly what it legal and what is not. With that grammar, you can write the recursive descent parser suggested above pretty easily.

Community
  • 1
  • 1
Ira Baxter
  • 93,541
  • 22
  • 172
  • 341
  • Let's assume that the user inputs properly formatted expressions and we ditch the defensive programming. This way, there are _no nested parentheses_. I use the regex and its capturing groups to act as a parser. It just seems like an awful lot of research and work implementing a parser from scratch, when I have in hand a regex that works almost as I want it to, just missing some fine-tuning. Your thoughts? – asafc Sep 29 '15 at 15:19
  • Your users are going to make mistakes typing formulas. They will expect parentheses whether you like it or not, You need a robust parser. – Ira Baxter Sep 29 '15 at 16:09