2

I have a file as an input which contain a json Array :

[ {
  ...,
  ...
  },
  {
  ...,
  ...
  },
  {
  ...,
  ...
  }
]

I want to read it without breaking the spring batch principales (With the same way as FlatFileReader or XmlReader)

I didn't find any way to do it the readers already implemented in spring-batch .

What's the best way to implement this reader ?

Thanks in Advance

Nabil
  • 1,771
  • 4
  • 21
  • 33

2 Answers2

1

Assuming you want to model the StaxEventItemReader in that you want to read each item of the JSON array as an item in Spring Batch, here's what I'd recommend:

  • RecordSeparatorPolicy - You'll need to implement your own RecordSepartorPolicy that indicates if you've finished reading in the full item or not. You can also use the RecordSeparatoerPolicy#postProcess to cleanup the beginning and ending [] you'll need to deal with as well as the comma delimiters.
  • LineTokenizer - You'll then want to create your own LineTokenzier that parses JSON. I was just working on one today for a project so you can use that code as a start (consider it untested):

    public class JsonLineTokenizer implements LineTokenizer {
    
        @Override
        public FieldSet tokenize(String line) {
            List<String> tokens = new ArrayList<>();
    
            try {
                HashMap<String,Object> result =
                        new ObjectMapper().readValue(line, HashMap.class);
    
                tokens.add((String) result.get("field1"));
                tokens.add((String) result.get("field2")));
    
            } catch (IOException e) {
                throw new RuntimeException("Unable to parse json: " + line);
            }
    
            return new DefaultFieldSet(tokens.toArray(new String[0]), {"field1", "field2"});
        }
    }
    
Michael Minella
  • 20,843
  • 4
  • 55
  • 67
  • Thanks for your response, The RecordSeparatorPolicy will be difficult to implement because i don't have a simple JSON item format , but a complex one that can contain as well a JSON array , any idea – Nabil Aug 30 '14 at 00:06
  • The only complexity that I can see is that the `RecordSeparatorPolicy` will need to be able to ignore the [] that wrap the entire document and the commas in between each item. Am I missing something? – Michael Minella Sep 01 '14 at 16:56
0

This is the record separator policy I wrote starting from your suggestions and from the default implementation. I use an internal plain string representation for the read record, but i found out very simple to parse the JSON with codehaus jettison JSON object.

public class JsonRecordSeparatorPolicy extends SimpleRecordSeparatorPolicy {

/**
 * True if the line can be parsed to a JSON object.
 * 
 * @see RecordSeparatorPolicy#isEndOfRecord(String)
 */
@Override
public boolean isEndOfRecord(String line) {
    return StringUtils.countOccurrencesOf(line, "{") == StringUtils.countOccurrencesOf(line, "}")
            && (line.trim().endsWith("}") || line.trim().endsWith(",") || line.trim().endsWith("]") );
}

@Override
public String postProcess(String record) {
    if(record.startsWith("[")) record = record.substring(1);
    if(record.endsWith("]")) record = record.substring(0, record.length()-1);
    if(record.endsWith(",")) record = record.substring(0, record.length()-1);
    return super.postProcess(record);
}

}

Maxvader
  • 115
  • 7