0

Short version of the Question

Why measuring query execution time with postgres \timeing on is dramatically different than using R to measure it ?

Long version

I'm comparing three different db schemas: one is 3-NF (NF in the pictures), one is denormalised (NN) and the other uses a custom access function (EF). All three return the same result. The denormalised version stores in fact the result of a complicate cross join, which is needed in NF. In the EF version some knowledge of the result structure is used to (maybe) be faster than the NF version. To simulate the performance degradation over time the databases in all three schemas are filled stepwise with data till march, june, september, december and query execution time is measured. Between each query I clear the cache with the steps described in See and clear Postgres caches/buffers?.

I worried using the R postgresSQL-API to execute the query because I feared I would more or less measure how long it takes R to put the result in a df. So I used R to generate and execute a bash script for each case. In this bash script I used grep to get the time postgreSQL needed to execute the query. Like shown here https://dba.stackexchange.com/questions/21509/pipe-psql-timing-to-file . The result is shown in the following graph.

PostgreSQL timeing

Except that I find it confusing that the normalised result is actually a bit faster than the denormalized version (if you have ideas why this may be let me know) this picture is quite clear.

However wrapping the execution of the bash script in R timing commandos, I get the following result:

non log version

The following is the same picture with an logarithmic scale:

log version

I wondered why the times differed so much. However, I was willing to take the postgreSQL times as the real ones. But then I used atop to track memory usage during this process and the result matches better with the R approach of time taking. The following picture shows the free mem pages and the color shows which db was tested during the time.

memory usage

Do you have any ideas why those two approaches differ and what would you treat as "real" execution time. Any help is greatly appreciated.

Community
  • 1
  • 1
bluefish
  • 395
  • 2
  • 13
  • And what do the explain plans tell you? – Jakub Kania Dec 18 '16 at 18:03
  • @JakubKania Thanks for your interest and try to help. Unfortunately, I don't know how looking at the explain plan can help me understand the observed difference between timing it with R or use the build in timing method of postgres. – bluefish Dec 19 '16 at 07:27
  • I suspect it may have a bit to do with when pg starts sending data to the client. However the differences are pretty big for that. How many times did you run each test? – Jakub Kania Dec 19 '16 at 08:01
  • @JakubKania I run each test four times. The results always showed the same pattern. However, the test where not completely identical since I changed small stuff around. (Like leaving grep out of my bash script to see if its grep causing this) – bluefish Dec 19 '16 at 08:26

0 Answers0