5

I am trying to split a Math Expression.

String number = "100+500";

String[] split = new String[3];

I want to make

  • split[0] = "100"
  • split[1] = "+"
  • split[2] = "500"

I tried this but I don't know what to write for splitting.

split = number.split(????);
berkc
  • 525
  • 3
  • 9
  • 21
  • 3
    Why do you want to split it? To write a parser or to evaluate the expression? In both cases split is probably not the right tool. – assylias Jan 06 '15 at 22:13
  • @assylias I am making a GUI Calculator for Big Integers. After splitting, I will check which operator used, then I will evaluate it. – berkc Jan 06 '15 at 22:16
  • 2
    Split uses regex, so this may be interesting: http://stackoverflow.com/questions/24463048/ruby-find-a-whole-math-expression-in-a-string-using-regex – Christian Tapia Jan 06 '15 at 22:17
  • Ah shoot, I misread this question. Split probably is not the best tool for the job, as it will consume the character it's splitting on. – Makoto Jan 06 '15 at 22:17
  • @Makoto yeah I tried it but I couldn't get the operator. I will just use +,- and * operators so all I have to do is split these 3. – berkc Jan 06 '15 at 22:20
  • This is not as easy as splitting strings - see for example: http://stackoverflow.com/questions/1792261/java-maths-parsing-api – assylias Jan 06 '15 at 22:20
  • If you really want to use split, then use something like tihs [`"100+500".split("(?<=\\+)|(?=\\+)")`](http://stackoverflow.com/questions/2206378/how-to-split-a-string-but-also-keep-the-delimiters). – Tom Jan 06 '15 at 22:30

6 Answers6

10

You want to split between digits and non-digits without consuming any input... you need look arounds:

String[] split = number.split("(?<=\\d)(?=\\D)|(?<=\\D)(?=\\d)");

What the heck is that train wreck of a regex?

It's expressing the initial sentence of this answer:

  • (?<=\d) means the previous character is a digit
  • (?=\D) means the next character is a non-digit
  • (?<=\d)(?=\D) together will match between a digit and a non-digit
  • regexA|regexB means either regexA or regexB is matched, which is used as above points, but non-digit then digit for the visa-versa logic

An important point is that look arounds are non-consuming, so the split doesn't gobble up any of the input during the split.


Here's some test code:

String number = "100+500-123/456*789";
String[] split = number.split("(?<=\\d)(?=\\D)|(?<=\\D)(?=\\d)");
System.out.println(Arrays.toString(split));

Output:

[100, +, 500, -, 123, /, 456, *, 789]

To work with numbers that may have a decimal point, use this regex:

"(?<=[\\d.])(?=[^\\d.])|(?<=[^\\d.])(?=[\\d.])"

which effectively just add . to the characters that are a "number".

Bohemian
  • 412,405
  • 93
  • 575
  • 722
  • It worked... but how? What does "(?<=\\d)(?=\\D)|(?<=\\D)(?=\\d)" mean? – berkc Jan 06 '15 at 22:41
  • @Dosher: Check out http://docs.oracle.com/javase/tutorial/essential/regex/. Predefined Character classes will explain what the \d and \D do. The other methods should explain the rest of the regex. – DivineWolfwood Jan 06 '15 at 22:46
  • @Bohemian That helped me a lot, now I understand it. Thank you so much! – berkc Jan 06 '15 at 23:02
  • 1
    @realNameDoesn'tExist see additions to answer – Bohemian Sep 07 '20 at 01:39
  • How would you update this regex to split parenthesis from other types of the string? would you need another barrel condition in there? – Narshe Dec 05 '20 at 19:32
  • 1
    @Narshe add this: `(?=[()])|(?<=[()])`, ie `"(?<=[\\d.])(?=[^\\d.])|(?<=[^\\d.])(?=[\\d.])|(?=[()])|(?<=[()])"` to split before and after brackets. – Bohemian Dec 06 '20 at 04:04
3

Off the bat, I don't know any library routine for the split. A custom splitting routine could be like this:

/**
 * Splits the given {@link String} at the operators +, -, * and /
 * 
 * @param string
 *            the {@link String} to be split.
 * @throws NullPointerException
 *             when the given {@link String} is null.
 * @return a {@link List} containing the split string and the operators.
 */
public List<String> split(String string) throws NullPointerException {
    if (string == null)
        throw new NullPointerException("the given string is null!");
    List<String> result = new ArrayList<String>();

    // operators to split upon
    String[] operators = new String[] { "+", "-", "*", "/" };

    int index = 0;
    while (index < string.length()) {
        // find the index of the nearest operator
        int minimum = string.length();
        for (String operator : operators) {
            int i = string.indexOf(operator, index);
            if (i > -1)
                minimum = Math.min(minimum, i);
        }

        // if an operator is found, split the string
        if (minimum < string.length()) {
            result.add(string.substring(index, minimum));
            result.add("" + string.charAt(minimum));
            index = minimum + 1;
        } else {
            result.add(string.substring(index));
            break;
        }
    }

    return result;
}

Some test code:

System.out.println(split("100+10*6+3"));
System.out.println(split("100+"));

Output:

[100, +, 10, *, 6, +, 3]
[100, +]
Niels Billen
  • 2,189
  • 11
  • 12
2

You can also use the Pattern/Matcher classes in Java:

    String expression = "100+34";
    Pattern p = Pattern.compile("(\\d+)|(\\+)");
    Matcher m = p.matcher(expression);
    String[] elems = new String[m.groupCount() +1];
    int i=0;

    while(m.find())
    {
        elems[i++] = m.group();
    }
panagdu
  • 2,133
  • 1
  • 21
  • 36
1

You can do something simple instead of insane regex; just pad + with white space:

String number = "100+500";
number = number.replace("+", " + ");

Now you can split it at the white space:

String[] split = number.split(" ");

Now your indices will be set:

split[0] = "100";
split[1] = "+";
split[2] = "500";

To check for all arithmetic symbols, you can use the following method if you wish to avoid regex:

public static String replacing(String s) {
   String[] chars = {"+", "-", "/", "="};

   for (String character : chars) {
      if (s.contains(character)) {
         s = s.replace(character, " " + character + " ");//not exactly elegant, but it works
      }
   }
   return s;
}

//in main method
number = replacing(number);
String[] split = number.split(" ");
Drew Kennedy
  • 4,118
  • 4
  • 24
  • 34
  • That is a great method for me but I also need to split - and *. Will number = number.replace("+", " + " || "-", " - " || "*", " * "); work? --Edit : OK it didn't work. – berkc Jan 06 '15 at 22:39
  • Unfortunately, you would have to include a `.replace()` for every character you wish to pad for this technique to work. A workaround is to create a method that will check for all areas desired to split, and constantly update the String, then return it. Your characters, such as `*` and `-` can be placed in an array to be iterated through. The benefit is you only have to write `replace()` once, so your code will look more elegant. Performance would still be the same though. – Drew Kennedy Jan 06 '15 at 22:45
0

You can split your expression string, then in result having pure tokens and categorized tokens. The mXparser library supports this as well as the calculation process. Please follow the below example:

Your very simple example "100+500":

import org.mariuszgromada.math.mxparser.*;
...
...
Expression e = new Expression("100+500");
mXparser.consolePrintTokens( e.getCopyOfInitialTokens() );

Result:

[mXparser-v.4.0.0]  --------------------
[mXparser-v.4.0.0] | Expression tokens: |
[mXparser-v.4.0.0]  ---------------------------------------------------------------------------------------------------------------
[mXparser-v.4.0.0] |    TokenIdx |       Token |        KeyW |     TokenId | TokenTypeId |  TokenLevel |  TokenValue |   LooksLike |
[mXparser-v.4.0.0]  ---------------------------------------------------------------------------------------------------------------
[mXparser-v.4.0.0] |           0 |         100 |       _num_ |           1 |           0 |           0 |       100.0 |             |
[mXparser-v.4.0.0] |           1 |           + |           + |           1 |           1 |           0 |         NaN |             |
[mXparser-v.4.0.0] |           2 |         500 |       _num_ |           1 |           0 |           0 |       500.0 |             |
[mXparser-v.4.0.0]  ---------------------------------------------------------------------------------------------------------------

More sophisticated example "2*sin(x)+(3/cos(y)-e^(sin(x)+y))+10":

import org.mariuszgromada.math.mxparser.*;
...
...
Argument x = new Argument("x");
Argument y = new Argument("y");
Expression e = new Expression("2*sin(x)+(3/cos(y)-e^(sin(x)+y))+10", x, y);
mXparser.consolePrintTokens( e.getCopyOfInitialTokens() );

Result:

[mXparser-v.4.0.0]  --------------------
[mXparser-v.4.0.0] | Expression tokens: |
[mXparser-v.4.0.0]  ---------------------------------------------------------------------------------------------------------------
[mXparser-v.4.0.0] |    TokenIdx |       Token |        KeyW |     TokenId | TokenTypeId |  TokenLevel |  TokenValue |   LooksLike |
[mXparser-v.4.0.0]  ---------------------------------------------------------------------------------------------------------------
[mXparser-v.4.0.0] |           0 |           2 |       _num_ |           1 |           0 |           0 |         2.0 |             |
[mXparser-v.4.0.0] |           1 |           * |           * |           3 |           1 |           0 |         NaN |             |
[mXparser-v.4.0.0] |           2 |         sin |         sin |           1 |           4 |           1 |         NaN |             |
[mXparser-v.4.0.0] |           3 |           ( |           ( |           1 |          20 |           2 |         NaN |             |
[mXparser-v.4.0.0] |           4 |           x |           x |           0 |         101 |           2 |         NaN |             |
[mXparser-v.4.0.0] |           5 |           ) |           ) |           2 |          20 |           2 |         NaN |             |
[mXparser-v.4.0.0] |           6 |           + |           + |           1 |           1 |           0 |         NaN |             |
[mXparser-v.4.0.0] |           7 |           ( |           ( |           1 |          20 |           1 |         NaN |             |
[mXparser-v.4.0.0] |           8 |           3 |       _num_ |           1 |           0 |           1 |         3.0 |             |
[mXparser-v.4.0.0] |           9 |           / |           / |           4 |           1 |           1 |         NaN |             |
[mXparser-v.4.0.0] |          10 |         cos |         cos |           2 |           4 |           2 |         NaN |             |
[mXparser-v.4.0.0] |          11 |           ( |           ( |           1 |          20 |           3 |         NaN |             |
[mXparser-v.4.0.0] |          12 |           y |           y |           1 |         101 |           3 |         NaN |             |
[mXparser-v.4.0.0] |          13 |           ) |           ) |           2 |          20 |           3 |         NaN |             |
[mXparser-v.4.0.0] |          14 |           - |           - |           2 |           1 |           1 |         NaN |             |
[mXparser-v.4.0.0] |          15 |           e |           e |           2 |           9 |           1 |         NaN |             |
[mXparser-v.4.0.0] |          16 |           ^ |           ^ |           5 |           1 |           1 |         NaN |             |
[mXparser-v.4.0.0] |          17 |           ( |           ( |           1 |          20 |           2 |         NaN |             |
[mXparser-v.4.0.0] |          18 |         sin |         sin |           1 |           4 |           3 |         NaN |             |
[mXparser-v.4.0.0] |          19 |           ( |           ( |           1 |          20 |           4 |         NaN |             |
[mXparser-v.4.0.0] |          20 |           x |           x |           0 |         101 |           4 |         NaN |             |
[mXparser-v.4.0.0] |          21 |           ) |           ) |           2 |          20 |           4 |         NaN |             |
[mXparser-v.4.0.0] |          22 |           + |           + |           1 |           1 |           2 |         NaN |             |
[mXparser-v.4.0.0] |          23 |           y |           y |           1 |         101 |           2 |         NaN |             |
[mXparser-v.4.0.0] |          24 |           ) |           ) |           2 |          20 |           2 |         NaN |             |
[mXparser-v.4.0.0] |          25 |           ) |           ) |           2 |          20 |           1 |         NaN |             |
[mXparser-v.4.0.0] |          26 |           + |           + |           1 |           1 |           0 |         NaN |             |
[mXparser-v.4.0.0] |          27 |          10 |       _num_ |           1 |           0 |           0 |        10.0 |             |
[mXparser-v.4.0.0]  ---------------------------------------------------------------------------------------------------------------

To understand what Token.tokenId and Token.tokenTypeId means you need to refer to the API documentation and parsertokens section. For instance in Operator class you have

  1. Operator.TYPE_ID - this corresponds to Token.tokenTypeId if Token is recognized as Operator
  2. Operator.OPERATOR_NAME_ID - this corresponds to Token.tokenId if Token is recognized as particular OPERATOR_NAME.

Please follow mXparser tutorial for better understanding.

Best regards

Leroy Kegan
  • 1,156
  • 10
  • 9
0

Since +,-,* basically all mathematically symbols are special characters so you put a "\\" before them inside the split function like this

String number = "100+500";
String[] numbers = number.split("\\+");
for (String n:numbers) {
  System.out.println(n);
}
Jeroen Heier
  • 3,520
  • 15
  • 31
  • 32