Hive Query Failing ( AMI - 3.11.0 , Hive- 0.13.1)

Question

Diagnostic Messages for this Task: Container [pid=3347,containerID=container_1490354262227_0013_01_000104] is running beyond physical memory limits. Current usage: 1.0 GB of 1 GB physical memory used; 1.5 GB of 5 GB virtual memory used. Killing container. Dump of the process-tree for container_1490354262227_0013_01_000104 : |- PID PPID PGRPID SESSID CMD_NAME USER_MODE_TIME(MILLIS) SYSTEM_TIME(MILLIS) VMEM_USAGE(BYTES) RSSMEM_USAGE(PAGES) FULL_CMD_LINE |- 3360 3347 3347 3347 (java) 7596 396 1537003520 262629 /usr/java/latest/bin/java -Djava.net.preferIPv4Stack=true -Dhadoop.metrics.log.level=WARN -Xmx864m -Djava.io.tmpdir=/mnt3/var/lib/hadoop/tmp/nm-local-dir/usercache/hadoop/appcache/application_1490354262227_0013/container_1490354262227_0013_01_000104/tmp -Dlog4j.configuration=container-log4j.properties -Dyarn.app.container.log.dir=/mnt/var/log/hadoop/userlogs/application_1490354262227_0013/container_1490354262227_0013_01_000104 -Dyarn.app.container.log.filesize=0 -Dhadoop.root.logger=INFO,CLA org.apache.hadoop.mapred.YarnChild 10.35.178.86 49938 attempt_1490354262227_0013_m_000004_3 104 |- 3347 2563 3347 3347 (bash) 0 1 115806208 698 /bin/bash -c /usr/java/latest/bin/java -Djava.net.preferIPv4Stack=true -Dhadoop.metrics.log.level=WARN -Xmx864m -Djava.io.tmpdir=/mnt3/var/lib/hadoop/tmp/nm-local-dir/usercache/hadoop/appcache/application_1490354262227_0013/container_1490354262227_0013_01_000104/tmp -Dlog4j.configuration=container-log4j.properties -Dyarn.app.container.log.dir=/mnt/var/log/hadoop/userlogs/application_1490354262227_0013/container_1490354262227_0013_01_000104 -Dyarn.app.container.log.filesize=0 -Dhadoop.root.logger=INFO,CLA org.apache.hadoop.mapred.YarnChild 10.35.178.86 49938 attempt_1490354262227_0013_m_000004_3 104 1>/mnt/var/log/hadoop/userlogs/application_1490354262227_0013/container_1490354262227_0013_01_000104/stdout 2>/mnt/var/log/hadoop/userlogs/application_1490354262227_0013/container_1490354262227_0013_01_000104/stderr

It's may be possible to optimize query so it will consume less memory. Please provide query as well as configuration parameters. — leftjoin, Mar 28 '17 at 09:28
@leftjoin Please find the Query in given link https://pastebin.com/wuNEFgnJ — shubh, Mar 28 '17 at 12:00
then see how to adjust memory settings for mapper in my answer — leftjoin, Mar 28 '17 at 13:34
@leftjoin how much memory should I use if I am processing around 500 GB of data — shubh, Mar 28 '17 at 14:01
Difficult to calculate, depend on file sizes, data itself, etc, try to increase until it will work. — leftjoin, Mar 28 '17 at 14:08
Also try to tune mapper parallelism: https://cwiki.apache.org/confluence/display/TEZ/How+initial+task+parallelism+works — leftjoin, Mar 28 '17 at 14:12
Sorry it was for tez. See here: http://stackoverflow.com/a/42842117/2700344 — leftjoin, Mar 28 '17 at 14:13

score 1 · Answer 1 · answered Mar 24 '17 at 13:59

Container [pid=3347,containerID=container_1490354262227_0013_01_000104] is running beyond physical memory limits. Current usage: 1.0 GB of 1 GB physical memory used; 1.5 GB of 5 GB virtual memory used.

Looks like your process needs more memory and it is exceeding the defined limit.

You need to increase the container size

SET hive.tez.container.size=4096MB
SET hive.auto.convert.join.noconditionaltask.size=1370MB

Read more about this here.

I am using execution engine mr – shubh Mar 28 '17 at 06:28 — shubh, Mar 28 '17 at 06:28

leftjoin · Answer 2 · 2017-03-28T12:46:02.213

If it is failing on reducer:

Add distribute by partition key to the query. It will distribute data between reducers and as a result reducers will create less partitions and consume less memory.

insert overwrite table items_s3_table PARTITION(w_id) select pk, cId, fcsku, cType, disposition, cReferenceId, snapshotId, quantity, w_id
from items_dynamodb_table distribute by w_id;

Try to decrease bytes per reducer. Decreasing this parameter will increase parallelizm (the number of reducers) and may reduce memory consumption per reducer. hive.exec.reducers.bytes.per.reducer=67108864;
Adjust memory settings if nothing helps.

For mappers:

mapreduce.map.memory.mb=4096; 
mapreduce.map.java.opts=-Xmx3000m;

For reducers:

mapreduce.reduce.memory.mb=4096; 
mapreduce.reduce.java.opts=-Xmx3000m;

Hive Query Failing ( AMI - 3.11.0 , Hive- 0.13.1)

2 Answers2