Container is running beyond physical memory. Hadoop Streaming python MR

Question

I am running a Python Script which needs a file (genome.fa) as a dependency(reference) to execute. When I run this command :

 hadoop jar /usr/local/hadoop/share/hadoop/tools/lib/had                                                                                                             oop-streaming-2.5.1.jar  -file ./methratio.py -file '../Test_BSMAP/genome.fa'  -                                                                                                             mapper './methratio.py -r -g ' -input /TextLab/sravisha_test/SamFiles/test_sam                                                                                                               -output ./outfile

I am getting this Error:

    15/01/30 10:48:38 INFO mapreduce.Job:  map 0% reduce 0%
    15/01/30 10:52:01 INFO mapreduce.Job: Task Idattempt_1422600586708_0001_m_000 009_0, Status : FAILED 
Container [pid=22533,containerID=container_1422600586708_0001_01_000017] is running beyond physical memory limits. Current usage: 1.1 GB of 1 GB physical memory used; 2.4 GB of 2.1 GB virtual memory used. Killing container.

I am using Cloudera Manager (Free Edition) .These are my config :

yarn.app.mapreduce.am.resource.cpu-vcores = 1
ApplicationMaster Java Maximum Heap Size = 825955249 B

mapreduce.map.memory.mb = 1GB
mapreduce.reduce.memory.mb = 1 GB
mapreduce.map.java.opts = -Djava.net.preferIPv4Stack=true
mapreduce.map.java.opts.max.heap = 825955249 B

yarn.app.mapreduce.am.resource.mb = 1GB
 Java Heap Size of JobHistory Server in Bytes = 397 MB

Can Someone tell me why I am getting this error ??

score 5 · Answer 1 · edited May 23 '17 at 12:07

I think your python script is consuming a lot of memory during the reading of your large input file (clue: genome.fa).

Here is my reason (Ref: http://courses.coreservlets.com/Course-Materials/pdf/hadoop/04-MapRed-6-JobExecutionOnYarn.pdf, Container is running beyond memory limits, http://hortonworks.com/blog/how-to-plan-and-configure-yarn-in-hdp-2-0/)

Container’s Memory Usage = JVM Heap Size + JVM Perm Gen + Native Libraries + Memory used by spawned processes

The last variable 'Memory used by spawned processes' (the Python code) might be the culprit.

Try increasing the mem size of these 2 parameters: mapreduce.map.java.opts
and mapreduce.reduce.java.opts.

score 0 · Answer 2 · answered Jul 21 '15 at 05:57

0

Try increasing the maps spawning at the time of execution ... you can increase no. of mappers by decreasing the split size... mapred.max.split.size ... It will have overheads but will mitigate the problem ....

answered Jul 21 '15 at 05:57

kanishka vatsa

2,074
18
8

Container is running beyond physical memory. Hadoop Streaming python MR

2 Answers2