1

I'm trying to optimize the speed in my grails app. I have this:

Catalog a= Catalog.findByName('a');
Element b= Element.findByCatalogAndNumber(a,2);

This way i can find b.

But I'm thinking I could use something like this:

Element b= Element.createCriteria().get{
       catalog{
          eq("name",'a')
       }
       eq("number",2)
}

But I'm not sure if it reduces the queries to the database, or I'm just making a fool of myself and creating even bigger files and reducing the speed of my app by doing this.

any idea?

Beryllium
  • 12,808
  • 10
  • 56
  • 86
user1698253
  • 55
  • 1
  • 7
  • 2
    optimizing without measuring is a waste of time – cfrick Mar 17 '15 at 19:55
  • It would be cool if there were a way to know how many database queries were run for each version. Even cooler - the ability to see the actual SQL. Oh well, maybe someday ... – Burt Beckwith Mar 17 '15 at 20:20
  • @BurtBeckwith "Someday" is today ... either by turning on the "DB statement log" feature, otherwise you can [turn it on in Grails](http://stackoverflow.com/questions/2568507/how-to-log-sql-statements-in-grails) – Beryllium Mar 19 '15 at 22:56
  • Cool. You've inspired me to go back in time and write [this blog post](http://burtbeckwith.com/blog/?p=1604) and [this update](http://burtbeckwith.com/blog/?p=1654) – Burt Beckwith Mar 19 '15 at 23:22

1 Answers1

2

I have compared three versions of your query using

  • Grails 2.4.4, default settings for caches in a the Grails application
  • PostgreSQL 8.4, SQL statement logging has been turned on to count/see the SQL queries.

The first versions one using two calls on the Grails domain class:

def query1() {
  Catalog a = Catalog.findByName('a');
  log.info(a)

  Element b = Element.findByCatalogAndPos(a, 2);
  log.info(b)

  render(b.toString())
}

The 2nd one using criteria

def query2() {
  Element b = Element.createCriteria().get {
    catalog {
      eq("name", "a")
    }
    eq("pos", 2)
  }

  render(b.toString())
}

and the last one using a where query

def query3() {
  def query = Element.where {
    catalog.name == "a" && pos == 2
  }

  Element b = query.get()

  render(b.toString())
}

The first one results in two SQL queries, the other ones will only send one query to the database (using an inner join from Element to Catalog).

As for readability/expressiveness, choose the 3rd version: It expresses your intention in a single line, and it's the most compact version.

As for performance, choose the 2nd or the 3rd version. Under high load, many concurrent users/requests, the number of queries does matter. This might not be an issue for all applications.

Anway, I'd always choose the 3rd version for the expressiveness; and it will scale, if the query conditions gets more complex over the time.


Update

The SQL statements used by the 1st version:

select this_.id as id1_1_0_, this_.version as version2_1_0_, this_.date_created as date_cre3_1_0_, this_.last_updated as last_upd4_1_0_, this_.name as name5_1_0_, this_.remark as remark6_1_0_ 
  from catalog this_ 
  where this_.name=$1 limit $2
Parameter: $1 = 'a', $2 = '1'

select this_.id as id1_2_0_, this_.version as version2_2_0_, this_.catalog_id as catalog_3_2_0_, this_.date_created as date_cre4_2_0_, this_.last_updated as last_upd5_2_0_, this_.pos as pos6_2_0_, this_.remark as remark7_2_0_ 
  from element this_ 
  where this_.catalog_id=$1 and this_.pos=$2 limit $3
Parameter: $1 = '10', $2 = '2', $3 = '1'

The SQL statement for the 2nd and 3rd version:

select this_.id as id1_2_1_, this_.version as version2_2_1_, this_.catalog_id as catalog_3_2_1_, this_.date_created as date_cre4_2_1_, this_.last_updated as last_upd5_2_1_, this_.pos as pos6_2_1_, this_.remark as remark7_2_1_, catalog_al1_.id as id1_1_0_, catalog_al1_.version as version2_1_0_, catalog_al1_.date_created as date_cre3_1_0_, catalog_al1_.last_updated as last_upd4_1_0_, catalog_al1_.name as name5_1_0_, catalog_al1_.remark as remark6_1_0_ 
  from element this_ inner join catalog catalog_al1_ 
    on this_.catalog_id=catalog_al1_.id 
    where (catalog_al1_.name=$1) and this_.pos=$2
Parameter: $1 = 'a', $2 = '2'
Beryllium
  • 12,808
  • 10
  • 56
  • 86
  • Counting queries is only an initial step. You have to look at the total time, the complexity of the queries (e.g. you might do it in one query, but it requires a significant amount of db server activity), the amount of data transferred (e.g. are you selecting too much data that gets discarded), etc. – Burt Beckwith Mar 19 '15 at 23:19
  • @BurtBeckwith I have added the SQL statements. In this case the same columns are fetched, and all queries are simple. But two queries always mean two complete round-trips to the DB, and under high load, the network bandwidth can be the bottleneck - I have seen this problem on a production system: Huge numbers of very simple SQL statements killing performance. On the other hand, SQL database are surprisingly happy with really complex queries, involving many joins/conditions. So "If you can do it in a single SQL statement, by all means do it in a single SQL statement" (a quote from "Ask Tom"). – Beryllium Mar 20 '15 at 00:38
  • Assuming that one query is always better than two different queries that return the same result is rather naive. – Burt Beckwith Mar 20 '15 at 00:41
  • @BurtBeckwith Then we agree to disagree. Yes, there might be some cases when uncorrellated queries cannot be put into one, but during the last two years of SQL performance tuning with Oracle and PostgreSQL I have not seen that isolated queries which can be combined perform better in a stress test. – Beryllium Mar 20 '15 at 07:32