0

hadoop:when enble jvm reuse,several map or reduce tasks running parallelly in a single node share static data?

In another words: if I have a static String "xxx" in maper class, When enble jvm reuse,can 2 maps running parallelly in single node share the String "xxx" or There is still 2 static String "xxx" in separate maps.

Why do I have the confusion? cause I see the below comments:


Jobs can enable task JVMs to be reused by specifying the job configuration mapred.job.reuse.jvm.num.tasks. If the value is 1 (the default), then JVMs are not reused (i.e. 1 task per JVM). If it is -1, there is no limit to the number of tasks a JVM can run (of the same job). One can also specify some value greater than 1 using the api. share|edit|flag answered Feb 2 '11 at 18:09 Joe Stein 749169 1 Thanks, one more question. Does those tasks also share some Class-loader, so all statics resources will be loaded only once? (Or may it works like tomcat, in that way there are almost no reason to share JVM...) – yura Feb 4 '11 at 13:10 1
The JVM will be cleared after a task completes. This parameter only provides better runtime for jobs that are not "long-running" since the jvm instantiation is very expensive. You could not share any ressources over task instances. – Thomas Jungblut Feb 4 '11 at 21:04


above comments reference from: Is it possible to run several map task in one JVM?

Community
  • 1
  • 1
Ejay
  • 49
  • 5
  • I think my comment is really clear. You can't share data, the JVM's internal heap is wiped. You can't put stuff there and hope that other tasks pick it up. – Thomas Jungblut Apr 19 '14 at 22:57
  • Thanks for your comments,I think you have explained clearly why serial tasks cannot share static data. But I still have a little confusion about parallel tasks.For example, Enble jvm reuse. When 2 maps running parallelly in single node,can the 2 map tasks share one JVM at that time? if the answer is true, Do you mean there are 2 heaps in the JVM? Or other things happening stop the 2 maps sharing static data? – Ejay Apr 20 '14 at 03:13
  • No, Hadoop does process isolation in order to help with fault tolerance. Isolating every map task is key in the restartability without invalidating too much of the other computation. So no: no tasks share the same JVM when they run in parallel, they are different processes. – Thomas Jungblut Apr 20 '14 at 08:29
  • So appreciate U,right now I got it – Ejay Apr 20 '14 at 08:59
  • HI @Thomas Jungblut ,sorry for one more question,During the whole job running(I mean the the Job have been running for a while).Is there any way to add data in a memory which can be shared by all nodes's task. Just like distributed cache(but as I known,it only can be set in the beginning and it is consisted of local file) – Ejay Apr 20 '14 at 13:14
  • Well you can read files when you're running your task, but you can't push stuff from the outside into the task's memory. – Thomas Jungblut Apr 20 '14 at 13:22
  • yep,I just thought of, If we deploy hbase and hadoop on the same cluster,We can use hbase as the"memory" shared by all nodes's task. – Ejay Apr 20 '14 at 14:31

0 Answers0