I've being searching for a proper tutorial online about how to use map and reduce, but almost every code about WordCount sucks and doesn't really explain you how to use each function. I've seen everything about the theory, the keys, the map etc, but there is no CODE for example doing something different than WordCount.
I am using Ubuntu 20.10 on Virtual Box and Hadoop version 3.2.1 (if you need any more info comment me).
My task is to manage a file that contains several data for athletes that took place on the Olympics.
You will see that it contains a variety of info, like name, sex, age, weight, height etc.
I will show an example here (hope you understand it):
ID Name Sex Age Height Weight Team NOC Games Year Season City
Sport Event Medal
1 A Dijiang M 24 180 80 China CHN 1992 Summer 1992 Summer Barcelona
Basketball Basketball Men's Basketball NA
Until now, I had to deal with data that are same to all of the records, like name or ID,
which are similar to each other.
(imagine having one participant more than once, that is my problem
at different period of time, so reduce cant recognise the records as same)
If I could change the key / recognision of the reduce function to the name for example of the participant, then I should have my correct result.
In this code I search for players that won at least on medal.
My main is:
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
public class NewWordCount {
public static void main(String[] args) throws Exception {
if(args.length != 3) {
System.err.println("Give the correct arguments.");
System.exit(3);
}
// Job 1.
Configuration conf = new Configuration();
Job job = Job.getInstance(conf, "count");
job.setJarByClass(NewWordCount.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
job.setMapperClass(NewWordMapper.class);
job.setCombinerClass(NewWordReducer.class);
job.setReducerClass(NewWordReducer.class);
FileInputFormat.addInputPath(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job,new Path(args[1]));
job.waitForCompletion(true);
}
}
My Mapper is:
import java.io.IOException;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;
public class NewWordMapper extends Mapper <LongWritable, Text, Text, IntWritable> {
private final static IntWritable one = new IntWritable();
private Text word = new Text();
private String name = new String();
private String sex = new String();
private String age = new String();
private String team = new String();
private String sport = new String();
private String games = new String();
private String sum = new String();
private String gold = "Gold";
private String silver = "Silver";
private String bronze = "Bronze";
public void map(LongWritable key, Text value, Context context) throws IOException,InterruptedException {
if(((LongWritable)key).get() == 0) {
return;
}
String line = value.toString();
String[] arrOfStr = line.split(",");
int counter = 0;
for(String a : arrOfStr) {
if(counter == 14) {
// setting the type of medal each player has won.
word.set(a);
// checking if the medal is gold.
if(a.compareTo(gold) == 0 || a.compareTo(silver) == 0 || a.compareTo(bronze) == 0) {
String[] goldenStr = line.split(",");
name = goldenStr[1];
sex = goldenStr[2];
age = goldenStr[3];
team = goldenStr[6];
sport = goldenStr[12];
games = goldenStr[8];
sum = name + "," + sex + "," + age + "," + team + "," + sport + "," + games;
word.set(sum);
context.write(word, one);
}
}
counter++;
}
}
}
My Reducer is:
import java.io.IOException;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Reducer;
public class NewWordReducer extends Reducer <Text, IntWritable, Text, IntWritable> {
public void reduce(Text key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException {
int count = 0;
for(IntWritable val : values) {
String line = val.toString();
String[] arrOfStr = line.split(",");
String name = arrOfStr[0];
count += val.get();
}
context.write(key, new IntWritable(count));
}
}