0

Being a Java Newbie, I am struggling with String.split. Trying to tokenism the following string

"(3,3,{S,W,P},{P,W,P},{P,P,P}),(1,2,{S,E}),(2,1,{{S},{E}})"

with the regex pattern "\\{|\\(|\\}|\\)|\\s|," using String.split.

Unfortunately, it also returns empty Strings where ever match occurs which I want to suppress similar to what StringSplitOptions.RemoveEmptyEntries does in C#.

On the contrary using StringTokenizer works quite well, but being deprecated I am trying to avoid it. To make my question clear I am trying an equivalent behavior with String.split as I would get using the following Tokenizer

new StringTokenizer(input2, "{},() \t")

Please suggest, how should I proceed.

Abhijit
  • 62,056
  • 18
  • 131
  • 204
  • What is the reason for downvote? – Abhijit May 05 '12 at 23:41
  • I don't see any reason to downvote this. Maybe someone's just having a bad day. (+1) – Alan Moore May 06 '12 at 00:58
  • @AlanMoore: I am not sure but in the last 2 days I have got 6 downvotes, and the anonymous down-voter is picking up answers and questions with high upvote (> +5) and down voting them without any explanation. I cannot see how I can get a respite from this. I am really getting frustrated seeing this behavior without any respite from SO. – Abhijit May 06 '12 at 06:52

3 Answers3

2

First, you can eliminate most of those backslashes by using a character class instead of alternation. Then, as Christopher said, you can add a + to mimic StringTokenizer's behavior of matching one or more delimiter characters:

"[{},()\\s]+"

Unfortunately, there's no way to prevent that first, empty token when the string starts with a delimiter. Trailing empty tokens are automatically dropped, but you have to filter out the leading one yourself.

Of course, you're free to use StringTokenizer if you want, or a third-party tool like Guava's Splitter.

Alan Moore
  • 73,866
  • 12
  • 100
  • 156
  • I think your answer makes lot of sense. And I understand I am free to use but the overtly expressive caveat to discourage the use of `StringTokenizer` was the reason I was looking for alternatives. I am also not very open using third-party tools as that comes as a portability issue. It seems I can still continue to use `StringTokenizer` for its verbosity, efficiency, and completeness. I believe the Java Guys should give a reason for discouraging people in using `StringTokenizer` – Abhijit May 06 '12 at 06:49
  • It's not a matter of discouraging StringTokenizer so much as encouraging `split()`, which was always intended to serve as a replacement for StringTokenizer. – Alan Moore May 06 '12 at 08:20
0

Try with this regular expression:

(\\{|\\(|\\}|\\)|\\s|,)+

And of course: StringTokenizer is NOT deprecated https://stackoverflow.com/a/6983926/278842

Community
  • 1
  • 1
Christopher Oezbek
  • 23,994
  • 6
  • 61
  • 85
  • Works except it returns an empty string at the 0th Index. Also can you please explain it? – Abhijit May 05 '12 at 23:01
  • Split works even if there is no text between the delimiters, returning an empty string. The regex I gave collapses several delimiters into one. At the end and the beginning of the string empty strings can remain. – Christopher Oezbek May 05 '12 at 23:07
  • The documentation is confusing for StringTokenizer as it also mentions "StringTokenizer is a legacy class that is retained for compatibility reasons although its use is discouraged in new code. It is recommended that anyone seeking this functionality use the split method of String or the java.util.regex package instead". – Abhijit May 05 '12 at 23:07
  • I would also want it to suppress the empty String at either end of the Array. And suggestion on that? – Abhijit May 05 '12 at 23:08
0

Try the commons-lang package, and look for StrTokenizer class. It will handle string splitting for you based on a delimiter and has an option for what to do with empty values (return as null, or ignore).

Matt
  • 11,523
  • 2
  • 23
  • 33