Apache Pig: OutOfMemory exception with simple GROUP BY in local mode

Question

I'm getting an OutOfMemory exception from Pig when trying to execute a very simple GROUP BY on a tiny (3KB), randomly-generated, example data set.

The pig script:

$ cat example.pig
raw =
LOAD 'example-data'
    USING PigStorage()
    AS (thing1_id:int,
        thing2_id:int,
        name:chararray,
        timestamp:long);

grouped =
GROUP raw BY thing1_id;

DUMP grouped;

The data:

$ cat example-data
281906  13636091    hide    1334350350
174952  20148444    save    1334427826
1082780 16033108    hide    1334500374
2932953 14682185    save    1334501648
1908385 28928536    hide    1334367665
[snip]

$ wc example-data
 100  400 3239 example-data

Here we go:

$ pig -x local example.pig

[snip]

java.lang.OutOfMemoryError: Java heap space
    at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.<init>(MapTask.java:949)
    at org.apache.hadoop.mapred.MapTask$NewOutputCollector.<init>(MapTask.java:674)
    at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:756)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
    at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:212)

[snip]

And some extra info:

$ apt-cache show hadoop | grep Version
Version: 1.0.2

$ pig --version
Apache Pig version 0.9.2 (r1232772) 
compiled Jan 17 2012, 23:49:20

$ echo $PIG_HEAPSIZE
4096

At this point, I feel like I must be doing something drastically wrong because I can't see any reason why 3 kB of text would ever cause the heap to fill up.

Possible duplicate of this question: http://stackoverflow.com/questions/16499432/pig-local-mode-group-or-join-java-lang-outofmemoryerror-java-heap-space — sversch, Dec 04 '14 at 00:06

shiva kumar s · Answer 1 · 2012-04-16T05:09:08.697

1

Check this: [link] http://sumedha.blogspot.in/2012/01/solving-apache-pig-javalangoutofmemorye.html

neil, you are right, let me explain the things like this: In the bin/pig script file, the source code is :

JAVA_HEAP_MAX=-Xmx1000m

# check envvars which might override default args

if [ "$PIG_HEAPSIZE" != "" ]; then JAVA_HEAP_MAX="-Xmx""$PIG_HEAPSIZE""m" fi

It is setting the Java_heap_size to maxium ("x") using the -Xmx switch only,but i didnot know why this script overriding is not working, that is the reason, i asked you to specify directly the java heap size using the paramters as specified in the link. I didnot got time to check why this problem is raising. If any one have idea please post it here.

edited Apr 16 '12 at 05:09

answered Apr 15 '12 at 21:49

shiva kumar s

169
5

1

Thanks, but I set `PIG_HEAPSIZE` to 4096 which sets the maximum Java heap to 4096 MB. Also, I'd seriously hope I don't need more than 1 GB of heap for 3kB of data. – Neil Williams Apr 15 '12 at 21:52
Neil, the answer is little big, so i updated my source itself, check it. – shiva kumar s Apr 16 '12 at 05:09

score 0 · Answer 2 · answered Apr 15 '12 at 23:33

You pig job is failing around the following code in MapTask.java:

931   final float recper = job.getFloat("io.sort.record.percent",(float)0.05);
932   final int sortmb = job.getInt("io.sort.mb", 100);
...
945   // buffers and accounting
946   int maxMemUsage = sortmb << 20;
947   int recordCapacity = (int)(maxMemUsage * recper);
948   recordCapacity -= recordCapacity % RECSIZE;
949   kvbuffer = new byte[maxMemUsage - recordCapacity];

So i suggest that you check what the configured value of io.sort.mb and io.sort.record.percent is, and whether following the above logic, maxMemUsage - recordCapacity this is close to, or bigger than your configured JVM heap size (4096 MB)

Both were at the default values, so should've come nowhere near the the limit. I ended up reinstalling from source (as opposed to .debs) and the problem went away. *shrug* — Neil Williams, Apr 16 '12 at 00:21

score 0 · Answer 3 · answered Apr 16 '12 at 00:20

0

I toyed with it for a while and ended up switching from the debian packages for hadoop/pig to the raw tarballs, and the problem went away. Not sure what to make of that :)

answered Apr 16 '12 at 00:20

Neil Williams

12,318
4
43
40

Apache Pig: OutOfMemory exception with simple GROUP BY in local mode

3 Answers3

Linked