1

I have a JSON file containing millions of records. I'd like to use a Jackson iterator to read the records one at a time and perform an action for each one. Here's the code so far.

MappingIterator<MyClass> iterator = new ObjectMapper()
        .readerFor(MyClass.class)
        .readValues(file);

while (iterator.hasNext()) {
    MyClass object = iterator.next();
    ...
}

The problem is that a few of the records are invalid due to missing quotes or illegal characters. This causes Jackson to throw an exception and quit. How can I tell Jackson to skip these records and continue to parse the remaining valid records?

mrog
  • 1,930
  • 3
  • 21
  • 28
  • don't you want to add try-catch to handle `RuntimeJsonMappingException`? and if it has cause `JsonMappingException` just skip it? – Ulad Sep 10 '20 at 22:05
  • I tried that. I get an exception from iterator.next(). And then I get the same exception again from iterator.hasNext(). That makes it impossible to continue parsing the file. – mrog Sep 10 '20 at 22:58

1 Answers1

1

try @JsonIgnoreProperties(ignoreUnknown = true) or you may need JsonFilter or customize serialization

@JsonInclude(JsonInclude.Include.NON_NULL)
@JsonDeserialize(using = UserDeserializer.class)
public class User {
    private Long id;
    private String name;
    private User() {}
    constructor, setter, getter 
}
public class UserDeserializer extends JsonDeserializer<User> {
    @Override
    public User deserialize(JsonParser jsonParser, DeserializationContext ctxt) throws IOException {
        try {
            ObjectCodec oc = jsonParser.getCodec();
            JsonNode node = oc.readTree(jsonParser);
            final Long id = node.get("id").asLong();
            final String name = node.get("name").asText();
            return new User(id, name);
        } catch (JsonParseException ex) {
        } catch (Exception e) {}
        return null;
    }
}


    public static void main(String[] args) throws IOException {
    String input = "[{\"id\": 1, \"name\": \"valid\"}," +
            " {\"id\": 2, \"name\": invalid}," +
            " {\"id\": 3, \"name\": \"valid\"}]";

    ObjectMapper objectMapper = new ObjectMapper();
    List<User> users = objectMapper.readValue(input, objectMapper.getTypeFactory().constructCollectionType(List.class, User.class));
    users.forEach(System.out::println);
}

Output

1 valid
null
null
3 valid

So you just ignore/filter null in collection

Rony Nguyen
  • 1,067
  • 8
  • 18
  • There aren't any unknown properties in the data. The only issue is poorly formatted JSON. – mrog Sep 10 '20 at 19:50
  • I see, so you have to customize deserialize by your own with this annotation @JsonDeserialize(using = YourCuztomeDeserializer.class) So using try/catch block you can skip all row which raises an exception cuz incorrect JSON format You can follow these example to see how they customize serialization - https://www.baeldung.com/jackson-deserialization - https://stackoverflow.com/questions/35359430/how-to-make-jackson-ignore-properties-if-the-getters-throw-exceptions – Rony Nguyen Sep 10 '20 at 20:03
  • I tried the custom deserializer, and it's also not helping. After I catch an exception in the deserializer, Jackson then tries to parse the next record starting right after the first illegal character in the file, which isn't the start of a new record. This causes iterator.hasNext() to throw an exception. – mrog Sep 11 '20 at 18:46
  • I just add my example, may help for u – Rony Nguyen Sep 11 '20 at 19:51
  • Try replacing `{\"id\": 2, \"name\": invalid}` with `{\"id\": 2, \"name: \"invalid\"}` (missing a closing quote) and you'll see why this approach won't work. – mrog Sep 11 '20 at 23:26
  • I see, hmm is there any idea to determine each record? like pair "{", "}" may back to string process =)) – Rony Nguyen Sep 11 '20 at 23:38
  • I might need to resort to something like bracket matching. I've seen a few times of malformed JSON so far, including a missing closing quote, illegal characters in attribute names, and at least one instance of a duplicate closing bracket. – mrog Sep 12 '20 at 05:25