2

My MR job is acquired as follows :

Job job = new Job(conf, "helloWorld");

Any values set on the conf are available across the nodes. But I'm not sure whether the following will work or not :

in MAP

conf.set("hello", "world");

in Driver

if( job.waitForCompletion(true) ){
     System.out.println(conf.get("hello"));
}

Will the modifications made to conf during map/reduce phase be visible in the driver ?

blackSmith
  • 3,054
  • 1
  • 20
  • 37
  • 2
    Interesting question! Did you try that and did not work? What would happen if each mapper set a different value? Maybe you could use MultipleOutput to store those values and retrieve them from the Driver. – vefthym Nov 26 '14 at 09:24
  • I tested it few minutes back and found out the answer you provided. – blackSmith Nov 26 '14 at 12:18

1 Answers1

2

When you submit a job, you also provide the Configuration, as you said:

Job job = new Job(conf, "helloWorld");

This is a call by value in Java (see this nice answer for example).

What would happen if many mappers set different values to hello?

I believe that the functionality you are looking for, is MultipleOutputs. Write the desired values in some new files, which you can read from the Driver, using hadoop's FileSystem, when the task is finished.

Community
  • 1
  • 1
vefthym
  • 7,422
  • 6
  • 32
  • 58
  • 1
    Thanks for the reply. Now, I'm gonna try `DistributedCache` to handle it if possible, since only few KBs of data required back in the driver. – blackSmith Nov 26 '14 at 12:21