I'm implementing word-count-style program on google books ngram. My input is binary file: https://aws.amazon.com/datasets/google-books-ngrams/ And I was told to use SequenceFileInputFormat in order to use it.
I'm using hadoop 2.6.5.
Configuration conf = new Configuration();
Job job = Job.getInstance(conf, "PartA");
job.setJarByClass(MyDriver.class);
job.setMapperClass(MyMapperA.class);
job.setReducerClass(MyReducerA.class);
job.setCombinerClass(MyReducerA.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
job.setInputFormatClass(SequenceFileInputFormat.class); // The new line
FileInputFormat.addInputPath(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));
System.exit(job.waitForCompletion(true) ? 0 : 1);
Sadly i'm receiving problems after adding this line:
job.setInputFormatClass(SequenceFileInputFormat.class);
The errors received:
java.lang.Exception: java.lang.IllegalArgumentException: Unknown codec: com.hadoop.compression.lzo.LzoCodec
at org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:462)
I've tried adding several maven dependencies, but without success