0

I have below String which is in the format of key1=value1, key2=value2 which I need to load it in a map (Map<String, String>) as key=value so I need to split on comma , and then load cossn as key and 0 its value.

String payload = "cossn=0, abc=hello/=world, Agent=Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/50.0.2661.102 Safari/537.36";

HashMap<String, String> holder = new HashMap();
String[] keyVals = payload.split(", ");
for(String keyVal:keyVals) {
  String[] parts = keyVal.split("=",2);
  holder.put(parts[0], parts[1]);
}   

I am getting java.lang.ArrayIndexOutOfBoundsException at this line holder.put(parts[0], parts[1]); and it is happening bcoz of this String Agent=Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/50.0.2661.102 Safari/537.36 since it has an extra comma in the value KHTML, like Gecko.

How can I fix this? In general below should be my keys and value after loading it in a map.

Key         Value
cossn       0
abc         hello/=world
Agent       Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/50.0.2661.102 Safari/537.36
user1950349
  • 4,738
  • 19
  • 67
  • 119
  • Will there always be 4 commas ? – UDKOX May 24 '16 at 00:26
  • No this is just a sample string. In general it is a very long string and content will change most of the time. – user1950349 May 24 '16 at 00:28
  • Can you define the format of the input or is this fixed? – Vampire May 24 '16 at 00:33
  • This is fixed, I don't have any control on this at all. – user1950349 May 24 '16 at 00:33
  • 1
    You're going to have to do some fancy parsing to only get commas immediately preceding equals signs. What control do you have over the input string/payload? Seems you need to wrap that better than a comma-delimited string like you have. I'd suggest using JSON which is designed for things like this. – Daniel Widdis May 24 '16 at 00:33
  • I don't have any control on this payload at all. – user1950349 May 24 '16 at 00:34
  • Where is the payload coming from? Who is providing it to you and by what means? – Daniel Widdis May 24 '16 at 00:34
  • Will the keys only contain alphanumeric characters? – jfdoming May 24 '16 at 00:35
  • Yes keys will only have alphanumeric characters. – user1950349 May 24 '16 at 00:37
  • @DanielWiddis Let's not worry about from where the payload is coming from. Sometimes there are things that you cannot change. – user1950349 May 24 '16 at 00:37
  • If you are stuck with the payload format, you will never get this properly done. You will always have to use some heuristics to decide where to split and where not if you cannot quote the keys and values or escape the delimiters if they are not delimiters. How would you e. g. split `a=a, b=b, c,c=d`? Is `b` = `b, c` or is `c,c = d`? However you decide, it will be a heuristic that will probably fail with some input. – Vampire May 24 '16 at 00:39
  • In your above case this is true `b = b, c` not the other one. – user1950349 May 24 '16 at 00:40
  • Yeah, but that is arbitrary heuristic you chose. Now you have to formulate your heuristics you would do manually into some code. Given the keys are only alphanumeric I gave you a suggestion as answer. – Vampire May 24 '16 at 00:44
  • @frenchDolphin you should read what I write before you make non-matching comments – Vampire May 24 '16 at 00:52
  • @user1950349: This looks exactly the same as your question "[Parse a string with key=value pair in a map?](http://stackoverflow.com/questions/37401889/parse-a-string-with-key-value-pair-in-a-map)". Please don't post the same question multiple times. – Daniel Pryden May 24 '16 at 01:02
  • It is different than previous one.. Question is almost same.. In that one, I was having issues with `=` which got fixed but then I ran into this `,` issue so that's why I opened a new one. – user1950349 May 24 '16 at 01:05
  • See also [Parse a URI String into Name-Value Collection](https://stackoverflow.com/questions/13592236/parse-a-uri-string-into-name-value-collection) – Vadzim Jun 14 '19 at 13:32

2 Answers2

4

As you said your keys only contain alphanumerics, the following would probably be a good heuristic for splitting:

payload.split("\\s*,\\s*(?=[a-zA-Z0-9_]+\\s*=|$)");

Which will split on probably whitespace framed commas that are followed by the end of the string or an alphanumeric key, optional whitespace and an equals sign.

Vampire
  • 35,631
  • 4
  • 76
  • 102
0

Given that you have no control over the payload, you need to do something to make the "illegal commas" not match your ", " regex.

Vampire provided a great regex. Since I've already gone down the road of manual parsing, I'll provide a non-regex solution below.

An alternate solution is to manually find the parse/split points yourself by iterating character by character and saving substrings. Keep track of the "last comma-space" until you get to the "next equals" in order to determine whether to split on that comma-space or not.

Here's some code that demonstrates what I'm trying to explain.

import java.util.Arrays;

public class ParseTest {

    static String payload = "cossn=0, abc=hello/=world, Agent=Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/50.0.2661.102 Safari/537.36";

    public static void main(String[] args) {
        int lastCommaSpace = -2;
        int beginIndex = 0;

        // Iterate over string
        // We are looking for comma-space pairs so we stop one short of end of
        // string
        for (int i = 0; i < payload.length() - 1; i++) {
            if (payload.charAt(i) == ',' && payload.charAt(i + 1) == ' ') {
                // This is the point we want to split at
                lastCommaSpace = i;
            }
            if (payload.charAt(i) == '=' && lastCommaSpace != beginIndex - 2) {
                // We've found the next equals, split at the last comma we saw
                String pairToSplit = payload.substring(beginIndex, lastCommaSpace);
                System.out.println("Split and add this pair:" + Arrays.toString(pairToSplit.split("=", 2)));
                beginIndex = lastCommaSpace + 2;
            }
        }
        // We got to the end, split the last one
        String pairToSplit = payload.substring(beginIndex, payload.length());
        System.out.println("Split and add this pair:" + Arrays.toString(pairToSplit.split("=", 2)));
    }

}
Daniel Widdis
  • 8,424
  • 13
  • 41
  • 63