-1

Let's say I have a string: "(2 * 32) + 5 ^ 2"

I'd like to turn this into a String array: [(2, *, 32, ), +, 5, ^, 2]

i.e. I don't want to capture spaces in the original string and I want to split by whitespace characters.

So I tried string.split**("\\s+")** but the result looks like [(2,*,32), +, 5, ^, 2].

Can someone explain why it doesn't split "(2" into (,2? Thank you!

ERohan
  • 91
  • 9
  • 2
    because there is no whitespace between `(` and `2`... – njzk2 Jun 20 '16 at 16:07
  • `\\s+` represents a 1+ sequence of whitespace. Obviously `(2` has no whitespaces in it. – Mena Jun 20 '16 at 16:08
  • In you're expected output you are ignoring the '*' is that desired? I would suggest going over each character in the string and adding it to your array if it is not a whitespace. – buczek Jun 20 '16 at 16:08
  • Maybe try `string.trim().toCharArray()` – johmsp Jun 20 '16 at 16:11
  • @njzk2 Oh, I see! How do I split it by whitespace characters and also "(" and ")"? – ERohan Jun 20 '16 at 16:15
  • 1
    `string.replaceAll("\\s+","").toCharArray()` if you can work with chars. Otherwise `string.replaceAll("\\s+","").split("")` (assuming the omission of `*` is a typo). If that assumption is correct, this question is a [duplicate](http://stackoverflow.com/questions/1521921/splitting-words-into-letters-in-java), and should be closed. – Ironcache Jun 20 '16 at 16:15
  • @johmsp `trim` only removes whitespace from the beginning and end of the string. – 4castle Jun 20 '16 at 16:15
  • @Ironcache Even better would just be `string.replace(" ", "")` if they don't plan on `\\s` capturing newlines or tab characters. – 4castle Jun 20 '16 at 16:17
  • Do you ever expect numbers to be longer than 1 digit? – 4castle Jun 20 '16 at 16:20
  • @Ironcache That makes sense! Replacing the whitespace characters, then splitting - thank you. I was just curious if there was a way to do it with just split. – ERohan Jun 20 '16 at 16:22
  • @4castle Good point, they could be longer. So ironcache's string.replaceAll("\\s+","").split("") wouldn't work in that case. – ERohan Jun 20 '16 at 16:24
  • It would still work. You can reconstruct them from the given array. It's nearly trivial to turn `[(, 2, 5, +, ...]` into `[(, 25, +, ...]`. However, there are better approaches in this case (as the answers are eluding to). – Ironcache Jun 20 '16 at 16:34
  • What if a number is `3.14`? How would you like that to be split? How about negative numbers, e.g. `5 * -7`? – Andreas Jun 20 '16 at 16:38
  • @Andreas There has to be a line drawn between doing the math, and parsing the string for tokens. I would argue that `-` and `.` are tokens with meaning that shouldn't be implemented in the split (though it would be possible, particularly the `.`). – 4castle Jun 20 '16 at 17:38

1 Answers1

1

This works, and has the added benefit of not splitting when there are numbers longer than 1 digit, and not requiring spaces between tokens.

String str = "(2*32) + 5 ^ 2";
String[] tokens = str.replace(" ", "").split("\\b|(?=\\D)");

Output:

[ (, 2, *, 32, ), +, 5, ^, 2 ]

Ideone Demo

4castle
  • 32,613
  • 11
  • 69
  • 106
  • This is what I needed, thanks! Do you mind explaining how you came up with that pattern "\\b|(?=\\D)" /pointing me to a resource? Newbie and I'd like to learn. – ERohan Jun 20 '16 at 16:38
  • 1
    @ERohan First pointer: Read the javadoc of [`Pattern`](https://docs.oracle.com/javase/7/docs/api/java/util/regex/Pattern.html) to learn the various supported regex patterns. Pattern means: Split on a word boundary or if the next character is not a digit. Caveat: Will split off the minus sign of a negative number and will split decimal points, e.g. `-3.14` becomes `[-, 3, ., 14]`, so you'll have to put those together again. – Andreas Jun 20 '16 at 16:41
  • 1
    @Ironcache A negative can't be handled properly, because there will be ambiguity with subtraction. Real life language parsers often treat negative numbers as positive ones, and use the unary minus operator to negate afterwards. – 4castle Jun 20 '16 at 16:47
  • you can replace `\D` by `[^0-9.,]` to keep the decimal and grouping separators, but I think these will be matched by `\b`, and the minus will still be separated anyway – njzk2 Jun 20 '16 at 16:49
  • @4castle That's not true. A `-` that is preceded by a mathematical operation (or nothing) is a unary minus (negative), whereas a `-` that is preceded by an evaluation is a minus operation. It can certainly be handled after splitting, but "can't be handled properly" is a stretch. – Ironcache Jun 20 '16 at 16:55
  • @Ironcache What I'm saying is that any functionality beyond this split should be implemented in the logic that consumes this array, including logic that converts numbers to negative or concatenates strings on either side of a `.` into a floating point number. Look up how real life parsers work to see all of this. – 4castle Jun 20 '16 at 16:57
  • Yep, I understand how real life parsers work (or, at the very least, the "real-life" parsers that CS101 teaches everyone). I have no issues with the approach you've proposed, and agree with it; I'm simply saying it **can be handled properly** before tossing it into your parser. – Ironcache Jun 20 '16 at 17:11