6

I have running hive query which running fine for small dataset. but i am running for 250 million records i have getting below errors in logs

 FATAL org.apache.hadoop.mapred.Child: Error running child : java.lang.OutOfMemoryError:   unable to create new native thread
    at java.lang.Thread.start0(Native Method)
    at java.lang.Thread.start(Thread.java:640)
    at org.apache.hadoop.mapred.Task$TaskReporter.startCommunicationThread(Task.java:725)
    at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:362)
    at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:396)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1136)
    at org.apache.hadoop.mapred.Child.main(Child.java:249)



 2013-03-18 14:12:58,907 WARN org.apache.hadoop.mapred.Child: Error running child
 java.io.IOException: Cannot run program "ln": java.io.IOException: error=11, Resource temporarily unavailable
    at java.lang.ProcessBuilder.start(ProcessBuilder.java:460)
    at java.lang.Runtime.exec(Runtime.java:593)
    at java.lang.Runtime.exec(Runtime.java:431)
    at java.lang.Runtime.exec(Runtime.java:369)
    at org.apache.hadoop.fs.FileUtil.symLink(FileUtil.java:567)
    at org.apache.hadoop.mapred.TaskRunner.symlink(TaskRunner.java:787)
    at org.apache.hadoop.mapred.TaskRunner.setupWorkDir(TaskRunner.java:752)
    at org.apache.hadoop.mapred.Child.main(Child.java:225)
 Caused by: java.io.IOException: java.io.IOException: error=11, Resource temporarily unavailable
    at java.lang.UNIXProcess.<init>(UNIXProcess.java:148)
    at java.lang.ProcessImpl.start(ProcessImpl.java:65)
    at java.lang.ProcessBuilder.start(ProcessBuilder.java:453)
    ... 7 more
2013-03-18 14:12:58,911 INFO org.apache.hadoop.mapred.Task: Runnning cleanup for the task
2013-03-18 14:12:58,911 INFO org.apache.hadoop.mapred.Child: Error cleaning up
  java.lang.NullPointerException
    at org.apache.hadoop.mapred.Task.taskCleanup(Task.java:1048)
    at org.apache.hadoop.mapred.Child.main(Child.java:281)

need help on this.

hjamali52
  • 1,135
  • 5
  • 12
  • 19
  • I've seen this before where you have no more disk space left on the task tracker node running the task (map or reduce). How big is your cluster, and what's the free space available on each cluster node (on the partition where mapred stores it temp files) – Chris White Mar 19 '13 at 10:46

3 Answers3

8

I've experienced this with MapReduce in general. In my experience it's not actually an Out of Memory error - the system is running out of file descriptors to start threads, which is why it says "unable to create new native thread".

The fix for us (on Linux) was to increase the ulimit, which was set to 1024, to 2048 via: ulimit -n 2048. You will need to have permissions to do this - either sudo or root access or have a hard limit of 2048 or higher so you can set it as your own user on the system. You can do this in your .profile or .bashrc settings file.

You can check your current settings with ulimit -a. See this reference for more details: https://stackoverflow.com/a/34645/871012

I've also seen many others talk about changing the /etc/security/limits.conf file, but I haven't had to do that yet. Here is a link talking about it: https://stackoverflow.com/a/8285278/871012

Community
  • 1
  • 1
quux00
  • 13,679
  • 10
  • 57
  • 69
1

If your Job is failing because of OutOfMemmory on nodes you can tweek your number of max maps and reducers and the JVM opts for each. mapred.child.java.opts (the default is 200Xmx) usually has to be increased based on your data nodes specific hardware.

Gargi
  • 113
  • 1
  • 9
-1

Thank you all.. You are correct. it is because of the file descriptor, as my program was generating lot of file in target table. due to multilevel of partition structure.

I have increased the ulimit and also xceivers property. it did helped. but still in our situation those limits were also crossed

Then we decided to distribute data as per the partitions and then we are getting only one file per partition.

It worked for us. We scaled our system to 50+billion records and it worked for us

hjamali52
  • 1,135
  • 5
  • 12
  • 19
  • Hi I am getting this error in `hive.log` as a sub exception of many exception. Do we need to restart machine running the hiveserver and/or metastore after setting `ulimit`? – Mahesha999 Jul 30 '15 at 14:51
  • It's more like a comment than a real answer. – raindev Jun 06 '16 at 08:22