0

I have a json file with complex structure.

{"Objects":{"items":{"item":[
{
"field1": "value1",
"field2": "value2",
"field3":[
     {
       "label1":"1",
       "label2":"2"
     },
     {
       "label1":"3",
       "label2":"4"
     }]
}
,
{
//same structure as above object
}
]}}}

The file size is a little more than 1GB. I need to read an object and see what the value of a particular label is and if it matches the list I have, I need to write that object in another file else not.

I know normal JSON parser like JSONSimple won't work as it hold the data into the memory. I am trying to use Jackson, but finding hard to go over all objects as it takes one token at a time. What is an efficient way to use streaming and tree structure of Jackson for this JSON format.

Or in what way can I use script to get the data and use it?

1 Answers1

0

Probably you could advance the JsonParser several times calling nextToken() until you get Token ID_START_ARRAY, call nextToken() to move to the start of the first item object and then feed the parser and POJO class representing "item" into ObjectMapper.readValue() (https://github.com/FasterXML/jackson-databind/blob/master/src/main/java/com/fasterxml/jackson/databind/ObjectMapper.java), repeat until no more objects are found. POJO can be hand-written or generated using something like https://github.com/astav/JsonToJava/wiki/JsonToJava

Or skip Jackson entirely - write a little tokenizer yourself that will extract individual "item" json elements and feed them into JSONSimple. This way you'll maybe have a bit of wheel reinvention, but will avoid getting a lot of dependencies.

vpa1977
  • 326
  • 1
  • 5