3

I was wondering if there is a "correct" way to parse a JSON file using Jackson where the JSON file contains a property that is huge without loading the entire stream into memory. I need to keep the memory low since it's an Android app. Am not asking here how to Android: Parsing large JSON file but rather one property is really large and the others don't matter.

For instance, let's say i have the following :

{
    "filename": "afilename.jpg",
    "data": "**Huge data here, about 20Mb base64 string**",
    "mime": "mimeType",
    "otherProperties": "..."
}

The data property could be extracted to a new file if needed (via an outputstream or other meanings) but i don't manage to achieve this using Jackson. Am open to use other libraries i just thought jackson would be ideal thanks to it's streaming API.

Thanks

Community
  • 1
  • 1
Kalem
  • 1,132
  • 12
  • 33

2 Answers2

3

Finally I manage to recover my huge data like this, where in is an inputstream over the json file i want to parse the data from and out is the file where am gonna write my data to:

public boolean extrationContenuDocument(FileInputStream in, FileOutputStream out, FileInfo info) 
throws JsonParseException, IOException {

    SerializedString keyDocContent = new SerializedString("data");
    boolean isDone = false;

    JsonParser jp = this.jsonFactory.createJsonParser(in);

    // Let's move our inputstream cursor until the 'data' property is found
    while (!jp.nextFieldName(keyDocContent)) {
        Log.v("Traitement JSON", "Searching for 'data' property ...");
    }

    // Found it? Ok, move the inputstream cursor until the begining of it's
    // content
    JsonToken current = jp.nextToken();

    // if the current token is not String value it means u didn't found the
    // 'data' property or it's content is not a correct => stop
    if (current == JsonToken.VALUE_STRING) {
        Log.v("Traitement JSON", "Property 'data' found");

        // Here it gets a little tricky cause if the file is not big enough
        // all the content of the 'data' property could be read directly
        // insted of using this
        if (info.getSize() > TAILLE_MIN_PETIT_FICHER) {
            Log.v("Traitement JSON", "the content of 'data' is too big to be read directly -> using buffered reading");

            // JsonParser uses a buffer to read, there is some data that
            // could have been read by it, i need to fetch it
            ByteArrayOutputStream debutDocStream = new ByteArrayOutputStream();
            int premierePartieRead = jp.releaseBuffered(debutDocStream);
            byte[] debutDoc = debutDocStream.toByteArray();

            // Write the head of the content of the 'data' property, this is
            // actually what as read from the inputstream by the JsonParser
            // when did jp.nextToken()
            Log.v("Traitement JSON", "Write the head");
            out.write(debutDoc);

            // Now we need to write the rest until we find the tail of the
            // content of the 'data' property
            Log.v("Traitement JSON", "Write the middle");

            // So i prepare a buffer to continue reading the inputstream
            byte[] buffer = new byte[TAILLE_BUFFER_GROS_FICHER];

            // The escape char that determines where to stop reading will be "
            byte endChar = (byte) '"';

            // Fetch me some bytes from the inputstream
            int bytesRead = in.read(buffer);
            int bytesBeforeEndChar = 0;

            int deuxiemePartieRead = 0;
            boolean isDocContentFin = false;

            // Are we at the end of the 'data' property? Keep writing the
            // content of the 'data' property if it's not the case
            while ((bytesRead > 0) && !isDocContentFin) {
                bytesBeforeEndChar = 0;

                // Since am using a buffer the escape char could be in the
                // middle of it, gotta look if it is
                for (byte b : buffer) {
                    if (b != endChar) {
                        bytesBeforeEndChar++;
                    } else {
                        isDocContentFin = true;
                        break;
                    }
                }

                if (bytesRead > bytesBeforeEndChar) {
                    Log.v("Traitement JSON", "Write the tail");
                    out.write(buffer, 0, bytesBeforeEndChar);
                    deuxiemePartieRead += bytesBeforeEndChar;
                } else {
                    out.write(buffer, 0, bytesRead);
                    deuxiemePartieRead += bytesRead;
                }

                bytesRead = in.read(buffer);
            }

            Log.v("Traitement JSON", "Bytes read: " + (premierePartieRead + deuxiemePartieRead) + " (" + premierePartieRead + " head,"
                    + deuxiemePartieRead + " tail)");
            isDone = true;
        } else {
            Log.v("Traitement JSON", "File is small enough to be read directly");
            String contenuFichier = jp.getText();
            out.write(contenuFichier.getBytes());
            isDone = true;
        }
    } else {
        throw new JsonParseException("The property " + keyDocContent.getValue() + " couldn't be found in the Json Stream.", null);
    }
    jp.close();

    return isDone;
}

It's not pretty, but works like a charm! @staxman let me know what you think.

Edit :


This is now an implemented feature , see : https://github.com/FasterXML/jackson-core/issues/14 and JsonParser.readBinaryValue()

Kalem
  • 1,132
  • 12
  • 33
1

EDIT: This is not a good answer for this question -- it would work if sub-trees were objects to bind, but NOT when the issue is a single big Base64-encoded String.


If I understand the question correctly, yes, you can read file incrementally but still you data-binding, if your input consists of a sequence of JSON Objects or arrays.

If so, you can use JsonParser to advance stream to point to the first object (its START_OBJECT token), and then use data-binding methods in either JsonParser (JsonParser.readValueAs()) or ObjectMapper (ObjectMapper.readValue(JsonParser, type)).

Something like:

ObjectMapper mapper = new ObjectMapper();
JsonParser jp = mapper.getJsonFactory().createJsonParser(new File("file.json"));
while (jp.nextToken() != null) {
   MyPojo pojo = jp.readValueAs(MyPojo.class);
   // do something
}

(note: depending on exact structure of JSON, you may need to skip some elements -- when calling readValueAs(), parser must have received START_ELEMENT that starts JSON Object to bind).

Or, even simpler, you may be able to use method readValues in ObjectReader:

ObjectReader r = mapper.reader(MyPojo.class);
MappingIterator<MyPojo> it = r.readValues(new File("file.json"));
while (it.hasNextValue()) {
   MyPojo pojo. = it.nextValue();
  // do something with it
}

in both cases Jackson data binder only reads as many JSON tokens as necessary to produce a single Object (MyPojo or whatever type you have). JsonParser itself only needs enough memory to contain information on a single JSON Token.

StaxMan
  • 113,358
  • 34
  • 211
  • 239
  • Thanks for the fast answer @StaxMan. But wont the `it.nextValue()` load all the content of my data property onto memory? If the data property is 20Mb String wont i get an OutOfMemoryException (Android platform)? That's actually what am trying to avoid. – Kalem Jul 17 '12 at 08:19
  • 1
    Hmmh. I must have misread your question actually... The "real" answer is that: (a) currently (Jackson 2.0), you can't avoid this, and (b) Jackson 2.1 will implement this feature -- https://github.com/FasterXML/jackson-core/issues/14 -- which will avoid handling your case (I need that feature myself for server-to-server syncing of BLOBs). – StaxMan Jul 17 '12 at 17:52
  • It's nice to know that am not the only one searching for this feature. For a moment i thought i was missusing Jackson and that this was already possible (since i didn't find any post about this). I manage to do what i needed using Jackson and IOStreams, it's not pretty but works for my case-scenario. I'll post my answer on a next. – Kalem Jul 18 '12 at 13:30
  • Yeah, no, it's a reasonable thing to ask for; although JSON data model does not make it obvious (XML parsers often have better streaming accessors to textual content). I am hoping to get 2.1 relatively soon, in case you need this in future. – StaxMan Jul 18 '12 at 21:22
  • Thanks again. Yeah it's pretty sure am gonna need this later on, since the webservice am requesting is not gonna change any time soon (they should've done a multipart html response with binary data insted = easier for me :p). Anyway, keep the good work! – Kalem Jul 19 '12 at 08:47