How to parse large files using flatpack

Question

I need to parse files that may be quite large, possibly 100s of megabytes and millions of lines. I have been trying to do this using FlatPack. I would think the way to do this would be to use the buffered parsers and the new stream methods. But, despite that dataset.next() returns true for the correct number of records, the Optional returned by dataset.getRecord() never contains a value.

I have looked at this example/test but it only counts the number of record and does not actually do anything with the content. example/test

score 0 · Answer 1 · answered Dec 14 '15 at 18:48

0

You can use the class BuffReaderParseFactory instead of DefaultParserFactory.

It will read one record from the input file only when you call "next()".

answered Dec 14 '15 at 18:48

diogopontual

83
6

I believe the example I linked to uses the Buffered classes but it does not do anything meaningful with the content. I need to have access to the field content of the record but when I try to access it I am getting errors. – user1723105 Jan 18 '16 at 16:50

score 0 · Answer 2 · answered Apr 07 '16 at 06:57

The explanations for both DefaultParserFactory and BuffReaderParseFactory are not exactly helpful. Both libraries said to return PZParser (from newDelimitedParser) but only one of them returns an actual value from a record. Based on the examples I've seen, I think BuffReaderParseFactory is just for checking performance (hence should be faster) and DefaultParserFactory on the other hand contains all the records.

How to parse large files using flatpack

2 Answers2