1

I have two files say file1.txt and file2.txt which have some records in it. Both files have same schema. I am using one Mapper class. I want to know whether a tuple read in mapper class came from file1 or file2

Abhishek
  • 650
  • 1
  • 8
  • 31
  • Possible duplicate of [How to get the input file name in the mapper in a Hadoop program?](http://stackoverflow.com/questions/19012482/how-to-get-the-input-file-name-in-the-mapper-in-a-hadoop-program) – Binary Nerd Jan 27 '17 at 08:08

2 Answers2

0

If you want to identify from which input file data is coming you need to override run method and recordreader class method. Which is little bit more complex I would suggest you to instead of that.

You can make multiple mapper for reading these both files then in each mapper you can add some token in your output which can help you in identifying your result according to mapper. Now in your driver class you need to use multipleinput class. Visit this link for more information ( https://hadoop.apache.org/docs/r2.6.3/api/org/apache/hadoop/mapreduce/lib/input/MultipleInputs.html ).

When you will run your jar file give path of both input files from which you want to read input data and aside from it path where you want to store your output.

Refer here for more details http://dailyhadoopsoup.blogspot.in/2014/01/mutiple-input-files-in-mapreduce-easy.html?m=1

I hope this solve your query.

siddhartha jain
  • 1,006
  • 10
  • 16
0

You can try this:-

We write the logic in map to obtain file name info in case we are getting multiple files to single mapper.

We could write this in the setup method if my mapper is getting only a single file. This saves I/O operations and fetches it only once.

String filename = new String; public void map(LongWritable key, Text values,Context context) { FileSplit fsFileSplit = (FileSplit) context.getInputSplit(); filename = context.getConfiguration().get(fsFileSplit.getPath().getParent().getName())); }

After this you can also write a logic to segregate the results( the line read ) based on the filename.

Deepan Ram
  • 842
  • 1
  • 10
  • 25