7

I receive the response from a service as below. How to parse this into a Map? I first thought of split at whitespace but it doesn't work as the value might contain spaces e.g. look at the value of SA key in the below response.

One option I thought of is to split at whitespace provided the previous character is a double quote. Not sure how to write the regex for this though.

TX="0000000000108000001830001" FI="" OS="8" CI="QU01SF1S2032" AW="SSS" SA="1525 Windward Concourse"

Aravind Yarram
  • 78,777
  • 46
  • 231
  • 327

3 Answers3

4

Parse at quotes. You could even use a regular expression to find each key/value pair, assuming each value is in quotes. My only question would be, what are the rules for if a value contains embedded quotes? (Are they escaped using '\' or such? Regardless, this is not currently accounted for in the below...)

For example:

(\w+)="([^"]*)"

This will even give you groups #1 and #2 that can be used to provide the key and the value, respectively.

Run this in a loop, using Java's Matcher.find() method, until you find all of the pairs.

Sample code:

String input = "TX=\"0000000000108000001830001\" FI=\"\" OS=\"8\" CI=\"QU01SF1S2032\" AW=\"SSS\" SA=\"1525 Windward Concourse\"";

Pattern p = Pattern.compile("\\s*(\\w+)=\"([^\"]*)\"\\s*");

Matcher m = p.matcher(input);
while(m.find()){
    System.out.println(m.group(1));
    System.out.println(m.group(2));
}

Output:

TX
0000000000108000001830001
FI

OS
8
CI
QU01SF1S2032
AW
SSS
SA
1525 Windward Concourse
ziesemer
  • 27,712
  • 8
  • 86
  • 94
  • 2
    Geez, just use single quotes; it's tagged Groovy :) – Dave Newton Jan 15 '12 at 03:45
  • @DaveNewton - We'll leave that as an exercise for the OP. :-) – ziesemer Jan 15 '12 at 03:48
  • @ziesemer - +1. But I am getting the valaue after '=' printed with double quoes as "0000000000108000001830001" – Aravind Yarram Jan 15 '12 at 04:31
  • @Pangea - because that's what it is in the input. What do you expect? "108000001830001"? If so, you'll need to parse it to a number - but given the above sample input and requirements, I'm not sure how you'd determine which values should be handled as numbers, and which should be left handled as Strings. – ziesemer Jan 15 '12 at 04:34
  • @ziesemer - I asked the question because your sample output in the response doesn't contain double quotes. Seems like in need to use replaceAll() method to remove double quotes – Aravind Yarram Jan 15 '12 at 04:36
  • @ziesemer - It doesn't contain the double quote after updating the latest code – Aravind Yarram Jan 15 '12 at 04:48
3

By the looks of the text it seems that it could be an XML. Is that so, or is that text the raw response of the service? If it is an XML you can parse it easily with Groovy's XmlSlurper:

def input = '<root TX="0000000000108000001830001" FI="" OS="8" CI="QU01SF1S2032" AW="SSS" SA="1525 Windward Concourse"></root>'
def xml = new XmlSlurper().parseText(input)

def map = xml.attributes()

The map variable would be [CI:QU01SF1S2032, AW:SSS, TX:0000000000108000001830001, OS:8, FI:, SA:1525 Windward Concourse]

If it's not an XML, you may follow ziesemer's answer and use a regex. A groovier version of his answer that generates a Map would be:

def input = 'TX="0000000000108000001830001" FI="" OS="8" CI="QU01SF1S2032" AW="SSS" SA="1525 Windward Concourse"'
def match = input =~ /(\w+)="([^"]*)"/

def map = [:]
match.each {
    map[it[1]] = it[2]
}

The result of map would be the same as before.

Community
  • 1
  • 1
epidemian
  • 18,817
  • 3
  • 62
  • 71
  • You can also do: `def map = ( match as List ).collectEntries { [ (it[1]):it[2] ] }` – tim_yates Jan 16 '12 at 12:24
  • @tim_yates Nice! I tried calling `collectEntries` on the `match` object, but it doesn't have that method, only the standard iteration methods. I didn't think of converting it into a `List` first. BTW, an `inject` can also do the trick =D – epidemian Jan 16 '12 at 13:42
2

StreamTokenizer is fast, although I haven't used the quoteChar() feature. Examples may be found here, here and here.

Console:

TX=0000000000108000001830001
FI=
OS=8
CI=QU01SF1S2032
AW=SSS
SA=1525 Windward Concourse
Count: 6
0.623 ms

Code:

import java.io.BufferedReader;
import java.io.IOException;
import java.io.StreamTokenizer;
import java.io.StringReader;

/** @see https://stackoverflow.com/questions/8867325 */
public class TokenizerTest {

    private static final String s = ""
        + "TX=\"0000000000108000001830001\" FI=\"\" OS=\"8\" "
        + "CI=\"QU01SF1S2032\" AW=\"SSS\" SA=\"1525 Windward Concourse\"";
    private static final char equal = '=';
    private static final char quote = '"';
    private static StreamTokenizer tokens = new StreamTokenizer(
        new BufferedReader(new StringReader(s)));

    public static void main(String[] args) {
        long start = System.nanoTime();
        tokenize();
        long stop = System.nanoTime();
        System.out.println((stop - start) / 1000000d + " ms");
    }

    private static void tokenize() {
        tokens.ordinaryChar(equal);
        tokens.quoteChar(quote);
        try {
            int count = 0;
            int token = tokens.nextToken();
            while (token != StreamTokenizer.TT_EOF) {
                if (token == StreamTokenizer.TT_WORD) {
                    System.out.print(tokens.sval);
                    count++;
                }
                if (token == equal) {
                    System.out.print(equal);
                }
                if (token == quote) {
                    System.out.println(tokens.sval);
                }
                token = tokens.nextToken();
            }
            System.out.println("Count: " + count);
        } catch (IOException e) {
            e.printStackTrace();
        }
    }
}
Community
  • 1
  • 1
trashgod
  • 203,806
  • 29
  • 246
  • 1,045
  • Good to know about StreamTokenizer – Aravind Yarram Jan 15 '12 at 04:17
  • I just _had_ to try out the `quoteChar()`; more above. – trashgod Jan 15 '12 at 04:27
  • I think this solution is overly complicated. Unless there is a big performance constraint, I would recomend going with a simpler solution, like using a regex (and if performance _is_ a constraint, it should be profiled to see if this is really faster than a regex, which I doubt). – epidemian Jan 15 '12 at 05:19
  • @epidemian: Yes, that's why I referenced a convenient [benchmark](http://stackoverflow.com/a/2082174/230513). – trashgod Jan 15 '12 at 07:08