I'm running apache drill 1.0(and then on 1.4) locally on a ubuntu machine that has 16GB of ram. When i work with a very large tab delimited file(52 Million rows, 7GB), and perform
Select distinct columns[0] from `table.tsv`
,performance seems to not improve at all the second time the same query is ran (both took 53 seconds). Usually the second time the same query ran, it takes less than half the time compared to the first query. It seems like Drill is not using all the allocated memory.
My conf/drill-env.sh file looks like:
DRILL_MAX_DIRECT_MEMORY="14G"
DRILL_HEAP="14G"
export DRILL_JAVA_OPTS="-Xms$DRILL_HEAP -Xmx$DRILL_HEAP -XX:MaxDirectMemorySize=$DRILL_MAX_DIRECT_MEMORY -XX:MaxPermSize=14G -XX:ReservedCodeCacheSize=1G -Ddrill.exec.enable-epoll=true"
I also did this within drill
alter system set `planner.memory.max_query_memory_per_node`=12884901888
However, when I check the memory usage using smem, it's using only about 5GB of RAM.
If i cut the table size to only 1 Million row, I can see the first query completed in 3.6seconds and the second time the same query ran, it took only 1.8 seconds
What am I missing?