I'm running a Hadoop job, and in my yarn-site.xml file, I have the following configuration:
<property>
<name>yarn.scheduler.minimum-allocation-mb</name>
<value>2048</value>
</property>
<property>
<name>yarn.scheduler.maximum-allocation-mb</name>
<value>4096</value>
</property>
However, I still occasionally get the following error:
Container [pid=63375,containerID=container_1388158490598_0001_01_000003] is running beyond physical memory limits. Current usage: 2.0 GB of 2 GB physical memory used; 2.8 GB of 4.2 GB virtual memory used. Killing container.
I've found that by increasing yarn.scheduler.minimum-allocation-mb, the physical memory allocated for the container goes up. However, I don't always want 4GB being allocated for my container, and thought that by explicitly specifying a maximum size, I'd be able to go around this problem. I realize that Hadoop can't figure out how much memory it needs to allocate for the container before the mapper runs, so how should I go about allocating more memory for the container only if it needs that extra memory?