current hdfs location in Hadoop map class

Asked Dec 02 '16 at 11:38

Active Dec 02 '16 at 11:38

Viewed 36 times

I have scenario, where I have list of HDFS location, which will be processed in one MR job, some of dataset can be present in multiple location. Ex:

Data set Id: dataset1, dataset2, dataset3.
HDFLocation1[dataset1,dataset2] (means this file have data for dataset1 and dataset2)
HDFLocation2[dataset1,dataset3]

I have below map, which have hdfs location need to process for give dataset.

[dataset1:HDFLoca1] 
[dataset2:HDFLoca2]
[dataset3:HDFLoca2]

I am thinking to implement below logic:

in Map method

fetch data set id (Ex:dataset1)
get Current HDFS location
Check with provided map if its desire location
Skip or process the data based on step no 3.

I have seen How to get the input file name in the mapper in a Hadoop program? but this does not work with Clodera version which I am using (Hadoop-core-2.5.1, CDH-5.3.1).

edited May 23 '17 at 12:30

Community

asked Dec 02 '16 at 11:38

Vikas Singh

2,838
5
17
32

it does not work => ? What is happening? – Ravindra babu Dec 02 '16 at 16:40
Alternative way: Could you instead add dataset id to the each record that you are processing and then group by the dataset id. Further process each group in reducer as needed by your application. – Amit Dec 05 '16 at 18:37

current hdfs location in Hadoop map class

0 Answers0