Streamsets: SpoolDIR_01 Failed to process file

Question

Hi I'm trying to run a pipeline to process a very large file (about 4milion records). Everytime it reaches to around 270, 000 it fails and then stops processing anymore records and returns this error.

'/FileLocation/FiLeNAME..DAT' at position '93167616': com.streamsets.pipeline.lib.dirspooler.BadSpoolFileException: com.streamsets.pipeline.api.ext.io.OverrunException: Reader exceeded the read limit '131072'.

If anyone else has experienced similar issue, please help. Thank you

I have checked the lines where it stops the pipeline but there seems to be nothing obvious there. Tried another file and still not working.

'/FileLocation/FiLeNAME..DAT' at position '93167616': com.streamsets.pipeline.lib.dirspooler.BadSpoolFileException: com.streamsets.pipeline.api.ext.io.OverrunException: Reader exceeded the read limit '131072'.

score 0 · Accepted Answer · answered Sep 25 '19 at 18:13

Looks like you're hitting the maximum record size. This limit is in place to guard against badly formatted data causing 'out of memory' errors.

Check your data format configuration and increase Max Record Length, Maximum Object Length, Max Line Length etc depending on the data format you are using.

See the Directory Origin documentation for more detail. Note in particular that you may have to edit sdc.properties if the records you are parsing are bigger than the system-wide limit of 1048576 bytes.

score 0 · Answer 2 · answered Nov 25 '19 at 23:54

I recently received this error message as well. When I come up against such size limits in StreamSets, I'll often set the limit to something ridiculous:

Then set the maximum value to the value given to me in the subsequent error message:

I find it really unfortunate that StreamSets then fails to process the rest of a file when an extra-long record is encountered. This seems counter intuitive to me for a tool used to process vast amounts of data.

Streamsets: SpoolDIR_01 Failed to process file

2 Answers2