I'm attempting to write a simple program to extract some data from a bunch of AVRO files. The schema for each file may be different so I would like to read the files generically (i.e. without having to pregenerate and then compile in the schema for each) using the C++ interface.
I have been attempting to follow the generic.cc
example but it assumes a separate schema where I would like to read the schema from each AVRO file.
Here is my code:
#include <fstream>
#include <iostream>
#include "Compiler.hh"
#include "DataFile.hh"
#include "Decoder.hh"
#include "Generic.hh"
#include "Stream.hh"
const std::string BOLD("\033[1m");
const std::string ENDC("\033[0m");
const std::string RED("\033[31m");
const std::string YELLOW("\033[33m");
int main(int argc, char**argv)
{
std::cout << "AVRO Test\n" << std::endl;
if (argc < 2)
{
std::cerr << BOLD << RED << "ERROR: " << ENDC << "please provide an "
<< "input file\n" << std::endl;
return -1;
}
avro::DataFileReaderBase dataFile(argv[1]);
auto dataSchema = dataFile.dataSchema();
// Write out data schema in JSON for grins
std::ofstream output("data_schema.json");
dataSchema.toJson(output);
output.close();
avro::DecoderPtr decoder = avro::binaryDecoder();
auto inStream = avro::fileInputStream(argv[1]);
decoder->init(*inStream);
avro::GenericDatum datum(dataSchema);
avro::decode(*decoder, datum);
std::cout << "Type: " << datum.type() << std::endl;
return 0;
}
Everytime I run the code, no matter what file I use, I get this:
$ ./avrotest twitter.avro
AVRO Testterminate called after throwing an instance of 'avro::Exception'
what(): Cannot have negative length: -40 Aborted
In addition to my own data files, I have tried using the data files located here: https://github.com/miguno/avro-cli-examples, with the same result.
I tried using the avrocat
utility on all of the same files and it works fine. What am I doing wrong?
(NOTE: outputting the data schema for each file in JSON works correctly as expected)