I have a string with multiple spaces, but when I use the tokenizer it breaks it apart at all of those spaces. I need the tokens to contain those spaces. How can I utilize the StringTokenizer to return the values with the tokens I am splitting on?
-
3You should be find if you're not using space-delimited data. If you are, good luck! Btw, it'd help if you gave us an example. – Edwin Feb 14 '12 at 21:33
-
2Please give an example of the string you're trying to tokenize and how you want the result to look. – matt freake Feb 14 '12 at 21:37
4 Answers
You'll note in the docs for the StringTokenizer
that it is recommended it shouldn't be used for any new code, and that String.split(regex)
is what you want
String foo = "this is some data in a string";
String[] bar = foo.split("\\s+");
Edit to add: Or, if you have greater needs than a simple split, then use the Pattern
and Matcher
classes for more complex regular expression matching and extracting.
Edit again: If you want to preserve your space, actually knowing a bit about regular expressions really helps:
String[] bar = foo.split("\\b+");
This will split on word boundaries, preserving the space between each word as a String
;
public static void main( String[] args )
{
String foo = "this is some data in a string";
String[] bar = foo.split("\\b");
for (String s : bar)
{
System.out.print(s);
if (s.matches("^\\s+$"))
{
System.out.println("\t<< " + s.length() + " spaces");
}
else
{
System.out.println();
}
}
}
Output:
this
<< 1 spaces
is
<< 6 spaces
some
<< 2 spaces
data
<< 6 spaces
in
<< 3 spaces
a
<< 1 spaces
string

- 76,169
- 12
- 136
- 161
-
1
-
@TravisJ - the OP's question does not provide enough detail to provide a precise solution for his problem; I have no idea if he wants N strings with some of them being all the space between the words, or if he has "empty" columns represented by some amount of the space between words, etc. Also, see section marked "edited to add". – Brian Roach Feb 14 '12 at 21:59
-
1If you cannot post an answer then perhaps you should abstain. I will provide a proper regex solution in an edited section. – Travis J Feb 14 '12 at 22:03
-
@TravisJ - Oh no, thank you; you encouraged me to provide the OP with an answer that was actually efficient and correct if that was his actual need. – Brian Roach Feb 16 '12 at 07:51
-
@Brain Roach - You may want to use efficient, and moreover correct, with more caution here. Using `\b` to separate the string on boundaries can have unintended affects when there are non characters present such as periods, dollar signs, accented letters, apostrophes, etc. Putting all these back together with logic would be very inefficient. – Travis J Feb 16 '12 at 22:22
Sounds like you may need to use regular expressions (http://docs.oracle.com/javase/1.4.2/docs/api/java/util/regex/package-summary.html) instead of StringTokenizer
.

- 3,632
- 3
- 26
- 43
Use String.split("\\s+")
instead of StringTokenizer
.
Note that this will only extract the non-whitespace characters separated by at least one whitespace character, if you want leading/trailing whitespace characters included with the non-whitespace characters that will be a completely different solution!
This requirement isn't clear from your original question, and there is an edit pending that tries to clarify it.
StringTokenizer
in almost every non-contrived case is the wrong tool for the job.