6

I noticed that the first time I run a query on RedShift, it takes 3-10 second. When I run same query again, even with different arguments in WHERE condition, it runs fast (0.2 sec). Query I was talking about runs on a table of ~1M rows, on 3 integer columns.

Is this huge difference in execution times caused by the fact that RedShift compiles the query first time its run, and then re-uses the compiled code?

If yes - how to always keep this cache of compiled queries warm?

One more question: Given queryA and queryB. Let's assume queryA was compiled and executed first. How similar should queryB be to queryA, such that execution of queryB will use the code compiled for queryA?

diemacht
  • 2,022
  • 7
  • 30
  • 44

1 Answers1

4

The answer of first question is yes. Amazon Redshift compiles code for the query and cache it. The compiled code is shared across sessions in a cluster, so the same query with even different parameters in the different session will run faster because of no overhead.

Also they recommend to use the result of the second execution of the query for the benchmark.

There is the answer for this question and details in the following link. http://docs.aws.amazon.com/redshift/latest/dg/c-compiled-code.html

Masashi M
  • 2,679
  • 21
  • 22
  • 1
    This is absolutely correct. I dug deeper into figuring out exactly what causes a re-compile of the code: https://medium.com/@pingram/redshift-code-compilation-977143576e89 – Phil Jul 27 '15 at 14:39
  • How can you tell if the second time, the result is not coming from a cached result? Can you really count on the execution time for the second attempt of the same query for performance testing? Another link http://docs.aws.amazon.com/redshift/latest/dg/c-query-performance.html – Faiz Jun 21 '16 at 05:40
  • you can test the impact of query compile in isolation by trying different queries on an empty table, to eliminate data caching as a variable. I found that query compile latency scales with the complexity of the query: in particular, number of tables and joins involved. See: http://wrschneider.github.io/2017/06/02/redshift-compile-latency.html – wrschneider Jun 02 '17 at 19:28