6

I just implemented Pentaho in my company and set its memory with 12GB. When we try to load a 16 million rows from one table to another, it goes out of memory.

I thought Pentaho would clear the memory when performing the commit on the database, but seemingly it doesn't happen. This exception is thrown when it loads around row 2.5 million, which means to load 16 million I'd need a 73Gb RAM machine? (a rough math, of course)

Is there any parameter or configuration to make the magic happen? This memory issue is limiting our loading capacity (16 million is only one of the tables ). Can't believe Pentaho will braise the memory until it bursts without clearing the cache eventually.

My file D:\Pentaho\server\biserver-ee\tomcat\bin\service.bat has the following line:

"%EXECUTABLE%" //US//%SERVICE_NAME% ++JvmOptions "-Djava.io.tmpdir=%CATALINA_BASE%\temp;
-Djava.util.logging.manager=org.apache.juli.ClassLoaderLogManager;
-Djava.util.logging.config.file=%CATALINA_BASE%\conf\logging.properties;
-XX:MaxPermSize=256m" --JvmMs 2048 --JvmMx 12288

Does it have anything to do with the line below?

-XX:MaxPermSize=256m

Could someone explain me what exactly it is?

Thanks in advance!

PS: This is my first contact with Pentaho, so, I am sorry for any unnecessary question or assumption.

Lucas Rezende
  • 2,516
  • 8
  • 25
  • 34
  • Some important information is missing: 1. Which JVM are you using? 2. What is the exact error you get? – Uwe Plonus Sep 16 '14 at 14:14
  • 1
    When asking Java questions involving exceptions always include the stack trace in the question. Are the --JvmMS and --JvmMX flags specific to Pentaho? I ask because that's not the flags used for setting memory on a JVM. – SteveD Sep 16 '14 at 14:16
  • Flags for (Oracle) JVM are -XmxN. N is the size with optional postfix k (kilo) and m (mega). Without postfix defaults to byte. Could that be `--JvmMx 12288m` ? – PeterMmm Sep 16 '14 at 14:20
  • @UwePlonus The Java error it throws is `java.lang.OutOfMemoryException`and the JVM I am using is 7 (build 1.7.0_55-b13). – Lucas Rezende Sep 16 '14 at 14:25
  • @PeterMmm That is part of the question... I can change, but not sure if I should. :) – Lucas Rezende Sep 16 '14 at 14:25
  • In any case, the JVM won't take all RAM (12GB). When you specify 12G the JVM will fail on start. – PeterMmm Sep 16 '14 at 14:27
  • @SteveD I think so. The `--JvmMs 2048` means the server will launch allocating 2gb of RAM and can go up to `--JvmMx 12288` – Lucas Rezende Sep 16 '14 at 14:28
  • @PeterMmm This `--JvmMx 12288` is the maximum of RAM Pentaho server can allocate, not sure if it is the Java memory (I don't think it is). Perhaps problem is on the `-XX:MaxPermSize=256m`. – Lucas Rezende Sep 16 '14 at 14:30
  • Don't know Pentaho, but some manual pages are talking about config like `--JvmMx=4096m`. Waiting for Pentaho experts ... – PeterMmm Sep 16 '14 at 14:32
  • Try http://stackoverflow.com/questions/542979/using-heapdumponoutofmemoryerror-parameter-for-heap-dump-for-jboss and take a heapdump. – Jayan Sep 16 '14 at 14:38
  • @LucasRezende Please provide the complete stack trace as a OOME can be caused by many different things and only the complete error stack trace would help us to find the source. – Uwe Plonus Sep 16 '14 at 14:39
  • This is what it returns: `Failed to execute runnable (java.lang.OutOfMemoryError: GC overhead limit exceeded)` – Lucas Rezende Sep 16 '14 at 15:34
  • Did you see http://stackoverflow.com/questions/5839359/java-lang-outofmemoryerror-gc-overhead-limit-exceeded – Jayan Sep 16 '14 at 16:14
  • How are you loading this table in Pentaho ? Is it a PDI transformation ? – Pedro Vale Sep 17 '14 at 13:34
  • Yes. Only Input Table and Output Table. Nothing else between. – Lucas Rezende Sep 23 '14 at 17:31

2 Answers2

3

On the MaxPermSize switch, Oracle Java versions prior to 8 have an area of memory called permgen (permanent generation).

See this answer for more detail on it.

This can be a source of out-of-memory exceptions, though not knowing Pentaho and your usage, it's hard to say if that's the source of your problem.

Community
  • 1
  • 1
SteveD
  • 5,396
  • 24
  • 33
2

Some ETL steps have to read (and therefore cache) all the data before they start giving results (e.g. Memory Group by, Stream lookup for the lookup stream). But if you only read (table input) and write (table output), data just goes in and out and you don't need to fit the whole table into memory (that would be pretty useless, right?).

The --JvmMs 2048 --JvmMx 12288 parameters look suspicious to me. Have you tried -Xms2g -Xmx12g ?

lukfi
  • 307
  • 1
  • 2
  • 11