Avro files writing to HDFS resulting in invalid block size

Question

When reading back files from HDFS I'm seeing these errors a lot:

{"id":"646626691524096003","user_friends_count":{"int":83},"user_location":{"string":"他の星から副都心線経由"},"user_description":{"string":"Exception in thread "main" org.apache.avro.AvroRuntimeException: java.io.IOException: Block size invalid or too large for this implementation: -40
    at org.apache.avro.file.DataFileStream.hasNextBlock(DataFileStream.java:275)
    at org.apache.avro.file.DataFileStream.hasNext(DataFileStream.java:197)
    at org.apache.avro.tool.DataFileReadTool.run(DataFileReadTool.java:77)
    at org.apache.avro.tool.Main.run(Main.java:84)
    at org.apache.avro.tool.Main.main(Main.java:73)
Caused by: java.io.IOException: Block size invalid or too large for this implementation: -40
    at org.apache.avro.file.DataFileStream.hasNextBlock(DataFileStream.java:266)
    ... 4 more

when we try and read them back with a variety of tools, e.g.:

$ java -jar ~/avro-tools-1.7.7.jar tojson FlumeData.1443002797525

The machine writing them to HDFS is a laptop on a flimsy connection so it's quite likely it disconnects regularly, but corrupt files aren't really expected - in this case the file seems to hit the invalid block size about 11% (vim estimate) of the way through the file.

FWIW I think the particular user description it was about to read out was for Twitter user @MyTime0627.

score 0 · Answer 1 · edited May 23 '17 at 11:45

0

You can check this post. I also met this problem. JSON SerDe and Avro SerDe can not process a event at the same time.

Cloudera 5.4.2: Avro block size is invalid or too large when using Flume and Twitter streaming

edited May 23 '17 at 11:45

Community

1
1

answered Mar 23 '16 at 21:39

dong

51
1
4

Avro files writing to HDFS resulting in invalid block size

1 Answers1