We're running Matillion (v1.54) on an AWS EC2 instance (CentOS), based on Tomcat 8.5. We have developped a few ETL jobs by now, and their execution takes quite a lot of time (that is, up to hours). We'd like to speed up the execution of our jobs, and I wonder how to identify the bottle neck.
What confuses me is that both the m5.2xlarge
EC2 instance (8 vCPU, 32G RAM) and the database (Snowflake) don't get very busy and seem to be sort of idle most of the time (regarding CPU and RAM usage as shown by top
).
Our environment is configured to use up to 16 parallel connections.
We also added JVM options -Xms20g -Xmx30g
to /etc/sysconfig/tomcat8
to make sure the JVM gets enough RAM allocated.
Our Matillion jobs do transformations and loads into a lot of tables, most of which can (and should) be done in parallel. Still we see, that most of the tasks are processed in sequence.
How can we enhance this?