-1

I want to write a small algorithm.

I'm facing the following issue: I have a String that can contain digits and the following symbols: -, (, ). I want to parse it, so I can get each symbol and number.

The method I want to write, (getNextToken) should return the symbols and numbers succesively. For example: getNextToken("(123-456)-12-1") should return:

  • on the first call: "("
  • on the second call: "123"
  • on the third call: "-"

and so on.

The problem I'm facing is that each numeric part can contain several digits.

I understand that it's not a big deal to write this kind of function, but it is not a "primitive" function. So, does Java have an utilit class to solve this problem?

Barranka
  • 20,547
  • 13
  • 65
  • 83
gstackoverflow
  • 36,709
  • 117
  • 359
  • 710

3 Answers3

5

java.util.StringTokenizer can be called to include the delimiters in the tokens

String str = "(123-456)-12-1";
StringTokenizer tokenizer = new StringTokenizer( str,"-()",true);
while (tokenizer.hasMoreTokens()) {
    System.out.println(tokenizer.nextToken());
 }

returns

(
123    
-
456
)
-
12
-
1

Is this what you wanted?

  • This is OK, but will yeild some unexpected results. For example, what happens if you add some spaces to the original input string? – markspace Nov 20 '15 at 20:11
  • If you want to use spit it can be done with `String[] result = str.split("((?<=-)|(?=-))|((?<=\\))|(?=\\)))|((?<=\\()|(?=\\())");` which is way less readable to me but may have a longer life is they deprecate the tokenizer. – John Teixeira Nov 20 '15 at 20:12
  • Yeah, right. Which is more maintainable, the OP or that regex horror you just posted? Sweet baby jebus people. http://blog.codinghorror.com/regular-expressions-now-you-have-two-problems/ – markspace Nov 20 '15 at 20:13
  • @markspace True, if that is in the scope of the problem. I am assuming that each token would be checked for validity. If we used `String str = "(123-45 6)-12-1";` we would get `(` ... `45 6` ... `1`. If desired, the `45 6` could be stripped of white space or, if the scope calls for a stricter action, invalid tokens would throw an exception. – John Teixeira Nov 20 '15 at 20:19
  • @JohnTeixeira Whitespace is so common in parsing that I assume it must be in the problem scope. I guess it might not be, but that would be pretty unusual. – markspace Nov 20 '15 at 20:21
  • @markspace I agree that it comes up often. Especially if you are parsing something that was not meant to be parsed. e.g. natural language or human-readable logs. I think it is less of an issue with things that are designed to be parsable like computer languages or CSV. But with those you gain the issue of escaped characters. In conclusion, there is always a gotcha in parsing :-) – John Teixeira Nov 20 '15 at 20:32
  • @JohnTeixerira There is also another way that does not use `String.split` and uses a regex with only 12 characters, see my answer. – mezzodrinker Nov 20 '15 at 20:37
  • @mezzodrinker I like your RegEx better than mine. I still am partial to the cleanness of the Tokenizer but if one feels unsure of its longevity I'd choose yours for readability. – John Teixeira Nov 20 '15 at 21:19
3

Another regular expressions solution with the same output as JohnTeixeira's answer:

String input = "(123-456)-12-1";
Pattern pattern = Pattern.compile("([()-]|\\d+)");
Matcher matcher = pattern.matcher(input);

while (matcher.find()) {
    System.out.println(matcher.group(1));
}

And it does not use the "not recommended" StringTokenizer class. You can find the exact details of this regular expression here.

Community
  • 1
  • 1
mezzodrinker
  • 998
  • 10
  • 28
1

I'm not sure if this is what you are looking for, and it's not really readable. That's the problem with regular expressions :\

String str = "(123-456)-12-1";
String splittedStr = Arrays.toString(str.split("((?<=-)|(?=-)|(?<=[(])|(?=[(])|(?<=[)])|(?=[)]))"));
System.out.println(splittedStr);
// Outputs: [(, 123, -, 456, ), -, 12, -, 1]

Edit: I found that the regular expression that I used can be simplified a lot. This new example uses the new shortened version:

String str = "(123-456)-12-1";
String splittedStr = Arrays.toString(str.split("((?<=-|[(]|[)])|(?=-|[(]|[)]))"));
System.out.println(splittedStr);
// Output: [(, 123, -, 456, ), -, 12, -, 1]
Tomasito665
  • 1,188
  • 1
  • 12
  • 24
  • I think this explains some issues with that regular expression you used: http://blog.codinghorror.com/regular-expressions-now-you-have-two-problems/ – markspace Nov 20 '15 at 20:12