What is the easiest way to get tokens from input String?

Question

Given a string

abc=1&b=2&fa=_

I need to split it to get an array of tokens:

["abc", "=", "1", "&", "b", "=", "2", "&", "fa", "=" , "_"]

My code:

public String[] getTokens(String input) {
    List<String> list = new ArrayList<>();
    String[] splitted = input.split("&");

    for (int k = 0, splittedLength = splitted.length; k < splittedLength; k++) {
        String part = splitted[k];
        String[] kv = part.split("=");
        for (int i = 0, kvLength = kv.length; i < kvLength; i++) {
            String elem = kv[i];
            list.add(elem);
            if (i < kvLength - 1) {
                list.add("=");
            }
        }
        if (k < splittedLength - 1){
            list.add("&");
        }
    }

    return list.toArray(new String[list.size()]);
}

I also need to consider cases when my key could not have a value (a=), and in this case I should set a default value to my key - an empty string (a="").

How can I do that?

See http://stackoverflow.com/questions/11733500/getting-url-parameter-in-java-and-extract-a-specific-text-from-that-url to get for instance a Map from a URL query string. — Joop Eggen, Feb 27 '16 at 22:22

score 0 · Answer 1 · answered Feb 27 '16 at 22:22

You are on the right way, first split with the "&" and than split with the "=", when the second split has no item at index 1 you know that there was no value and you can set it to emptyString. Just debug with the one and the other case and you will find the right solution.

score 0 · Answer 2 · answered Feb 27 '16 at 22:33

This is a query string you are trying to parse and usually the easiest way is definitely not to write it yourself but rather look for a library that performs this sort of trivial task. So may I suggest https://hc.apache.org/httpcomponents-client-ga/httpclient/apidocs/org/apache/http/client/utils/URLEncodedUtils.html#parse(java.lang.String,%20java.nio.charset.Charset)

Stefan Haustein · Answer 3 · 2016-02-27T23:07:52.490

Why not just use java.io.StreamTokenizer:

public static String[] getTokens(String input) {
  try {
    ArrayList<String> result = new ArrayList<>();
    StreamTokenizer tokenizer = new StreamTokenizer(new StringReader(input));
    while (tokenizer.nextToken() != StreamTokenizer.TT_EOF) {
      switch (tokenizer.ttype) {
        case StreamTokenizer.TT_WORD:
          result.add(tokenizer.sval);
          break;
        case StreamTokenizer.TT_NUMBER:
          result.add(String.valueOf(tokenizer.nval));
          break;
        default:
          result.add(String.valueOf((char) tokenizer.ttype));
      }
    }
    return result.toArray(new String[result.size()]);
  } catch (IOException e) {
    throw new RuntimeException(e);
  }
}

Output for Arrays.toString() of the returned value for your example:

[abc, =, 1.0, &, b, =, 2.0, &, fa, =, _]

Concerning the second question (default values after =): To keep it simple, I'd post-process the token array (result) in a second loop and check if = is immediately followed by & or at the end, and in this case insert an empty string after the = token.

I think your solution does not work. What if a key will be `k1`? Then tokenizer will split it into two elements (`k` and `1`), which is wrong. — user3633595, Feb 27 '16 at 23:54
Well that case was not part of your question... Why do you assume StreamTokenizer will split between k and 1? From the javadoc: "A word token consists of a word constituent followed by zero or more word constituents or number constituents." — Stefan Haustein, Feb 28 '16 at 00:04

What is the easiest way to get tokens from input String?

3 Answers3