1

I need to parse files that may be quite large, possibly 100s of megabytes and millions of lines. I have been trying to do this using FlatPack. I would think the way to do this would be to use the buffered parsers and the new stream methods. But, despite that dataset.next() returns true for the correct number of records, the Optional returned by dataset.getRecord() never contains a value.

I have looked at this example/test but it only counts the number of record and does not actually do anything with the content. example/test

user1723105
  • 121
  • 9

2 Answers2

0

You can use the class BuffReaderParseFactory instead of DefaultParserFactory.

It will read one record from the input file only when you call "next()".

  • I believe the example I linked to uses the Buffered classes but it does not do anything meaningful with the content. I need to have access to the field content of the record but when I try to access it I am getting errors. – user1723105 Jan 18 '16 at 16:50
0

The explanations for both DefaultParserFactory and BuffReaderParseFactory are not exactly helpful. Both libraries said to return PZParser (from newDelimitedParser) but only one of them returns an actual value from a record. Based on the examples I've seen, I think BuffReaderParseFactory is just for checking performance (hence should be faster) and DefaultParserFactory on the other hand contains all the records.

Kara
  • 69
  • 6