Pig: parse bytearray as a string/json

Question

I have some json data format saved to S3 in SequenceFile format by secor. I want to analyze it using Pig. Using elephant-bird I managed to get it from S3 in bytearray format, but I wasn't able to convert it to chararray, which is apparently needed to parse Json:

%declare SEQFILE_LOADER 'com.twitter.elephantbird.pig.load.SequenceFileLoader';
%declare LONG_CONVERTER 'com.twitter.elephantbird.pig.util.LongWritableConverter';
%declare BYTES_CONVERTER 'com.twitter.elephantbird.pig.util.BytesWritableConverter';
%declare TEXT_CONVERTER 'com.twitter.elephantbird.pig.util.TextConverter';

grunt> A = LOAD 's3n://...logs/raw_logs/...events/dt=2015-12-08/1_0_00000000000085594299'
       USING $SEQFILE_LOADER ('-c $LONG_CONVERTER', '-c $BYTES_CONVERTER')
       AS (key: long, value: bytearray);
grunt> B = LIMIT A 1;
grunt> DUMP B;

(85653965,{"key": "val1", other json data, ...})

grunt> DESCRIBE B;

B: {key: long,value: bytearray}

grunt> C = FOREACH B GENERATE (key, (chararray)value);
grunt> DUMP C;

2015-12-08 19:32:09,133 [main] ERROR org.apache.pig.tools.grunt.Grunt -
   ERROR 1075: Received a bytearray from the UDF or Union from two different Loaders.
   Cannot determine how to convert the bytearray to string.

Using TextConverter insted of the BytesWritableConverter just leaves me with empty values, like:

(85653965,)

It's apparent that Pig was able to cast the byte array to a string to dump it, so it doesn't seem like it should be imposible. How do I do that?

What is Pig version? – Patrick the Cat Dec 09 '15 at 00:24 — Patrick the Cat, Dec 09 '15 at 00:24

Pig: parse bytearray as a string/json

0 Answers0