Strange Behaviour of String Tokenizer

Question

I have a String with delimiter (~)

    String str="ABC~DEF~GHI~JKL~~MNO";// Input String
     while(stk.hasMoreTokens()){
            obj[i]=stk.nextToken();
            i++;
        }
        for(Object ob:obj){
            System.out.print(ob+"~>");
        }

I am using StringTokenizer to break String into Tokens, but whenever consecutive delimeter comes in between without any Space then StringTokenizer skips it and take the next Token

Actual Output

ABC~>DEF~>GHI~>JKL~>MNO~>null~>

Desired Outupt

ABC~>DEF~>GHI~>JKL~>null~>MNO~> // Don't want to skip consecutive tokens

Why this is happening ?

Note :

I know i can get the desired output using String#split(String delimeter) method but , i want to know the root cause why there is a Strange Behaviour.

Same Question has been asked here (String Tokenizer issue) but no reason was provided , only alternative solutions are there

You should go through the code of `StringTokenizer` you will understand it. Look at `skipDelimiters()` method and carefully observe the conditions and when `position` is incremented in the `while` loop. — TheLostMind, Mar 04 '15 at 06:48
@Innovation , and my question is exactly the same , Why it is doing so :) — Neeraj Jain, Mar 04 '15 at 06:49
possible duplicate of [Why is StringTokenizer deprecated?](http://stackoverflow.com/questions/6983856/why-is-stringtokenizer-deprecated) — Innovation, Mar 04 '15 at 07:07
@Innovation , Before Marking as duplicate kindly read that post , there is nothing mentioned about abnormal behaviour like this . — Neeraj Jain, Mar 04 '15 at 07:24

CoronA · Accepted Answer · 2015-03-04T07:23:19.970

I assume you used new StringTokenizer(str,"~")

StringTokenizer uses the definition of token: A token is a maximum non empty char sequence sequence between delimiters.

Since the string between ~~ is empty, it cannot be a token (by this definition).

I used following code to verify that:

public static void main(String[] args) {
    List<Object> obj = new ArrayList<>();
    String str = "ABC~DEF~GHI~JKL~~MNO";// Input String
    StringTokenizer stk = new StringTokenizer(str,"~");
    while (stk.hasMoreTokens()) {
        obj.add(stk.nextToken());
    }
    for (Object ob : obj) {
        System.out.print(ob + "~>");
    }
}

Actual Output (being consistent with the definition of token)

ABC~>DEF~>GHI~>JKL~>MNO~>

If the question is: Why is a token defined this way? Look at this example:

String str = "ABC DEF GHI"; // two spaces between

Stringtokenizer finds 3 Tokens. If you do not force a token to be non empty, this would return 5 Tokens (2 are ""). If you write a simple parser the current behaviour is more preferrable.

score 1 · Answer 2 · answered Mar 04 '15 at 07:19

1

You can't make StringTokenizer work the way you want it to (it never returns blanks), but you can use String#split() instead:

for (String token : str.split("~")) {
    // there will be a blank token where you expect it
}

Besides, this code is a whole lot simpler too.

answered Mar 04 '15 at 07:19

Bohemian

412,405
93
575
722

Yes I already told in my post I can use `split()` , but i want to know the reason , which Stefan provides well , but I think string Tokenizer returns `blank` , if i provide a single blank space between 2 `~~` like this `~ ~` then I will get desired output – Neeraj Jain Mar 04 '15 at 07:42
@neeraj that's not a "blank" - it's a space. A blank is a zero-length string. Note that `String#isEmpty()` is true only for zero length strings - ie "blanks". – Bohemian Mar 04 '15 at 09:41

score 0 · Answer 3 · answered Mar 04 '15 at 07:25

The nextToken() method calls the skipDelimiter(int startPos) method to find the index of the next token.

/**
 * Skips delimiters starting from the specified position. If retDelims
 * is false, returns the index of the first non-delimiter character at or
 * after startPos. If retDelims is true, startPos is returned.
 */
private int skipDelimiters(int startPos)

since there is no string between ~~ its behavior is right.

The documentations also clearly says:

StringTokenizer is a legacy class that is retained for compatibility reasons although its use is discouraged in new code. It is recommended that anyone seeking this functionality use the split method of String or the java.util.regex package instead.

mehdinf · Answer 4 · 2015-03-04T10:38:08.060

StringTokenizer has a private flag (returnDelims) which is false by default. It is written

If the returnDelims flag is true, then the delimiter characters are also returned as tokens. Each delimiter is returned as a string of length one. If the flag is false, the delimiter characters are skipped and only serve as separators between tokens.

StringTokenizer has another constructor for setting value to it. You should pass true to returnDelims flag for your purpose, like this

    String str = "ABC~DEF~GHI~JKL~~MNO";// Input String
    final String token = "~";
    StringTokenizer stk = new StringTokenizer(str, token, true);
    Object[] obj = new Object[10];
    int i = 0;
    String lasToken = "";
    while (stk.hasMoreTokens()) {
        String nexToken = stk.nextToken();
        if (!token.equals(nexToken)) {
            obj[i] = nexToken;
            i++;
        } else if (token.equals(lasToken)) {
            i++;
        }
        lasToken = nexToken;
    }
    for (i = 0; i < obj.length; i++) {
        System.out.print(obj[i] + "~>");
    }

Strange Behaviour of String Tokenizer

4 Answers4