4

I'm looking for a method that is faster and uses less server processing. In my application, I can use both .where and .detect:

Where:

User.where(id: 1)
# User Load (0.5ms)

Detect:

User.all.detect{ |u| u.id == 1 }
# User Load (0.7ms). Sometimes increases more than .where

I understand that .detect returns the first item in the list for which the block returns TRUE but how does it compares with .where if I have thousands of Users?

Edited for clarity.

.where is used in this example because I may not query for the id alone. What if I have a table column called "name"?

Community
  • 1
  • 1
Sylar
  • 11,422
  • 25
  • 93
  • 166

3 Answers3

7

In this example

User.find(1)        # or
User.find_by(id: 1)

will be the fastest solutions. Because both queries tell the database to return exactly one record with a matching id. As soon as the database finds a matching record, it doesn't look further but returns that one record immediately.

Whereas

User.where(id: 1)

would return an array of objects matching the condition. That means: After a matching record was found the database would continue looking for other records to match the query and therefore always scan the whole database table. In this case – since id is very likely a column with unique values – it would return an array with only one instance.

In opposite to

User.all.detect { |u| u.id == 1 }

that would load all users from the database. This will result in loading thousands of users into memory, building ActiveRecord instances, iterating over that array and then throwing away all records that do not match the condition. This will be very slow compared to just loading matching records from the database.

Database management systems are optimized to run selection queries and you can improve their ability to do so by designing a useful schema and adding appropriate indexes. Every record loaded from the database will need to be translated into an instance of ActiveRecord and will consume memory - both operations are not for free. Therefore the rule of thumb should be: Whenever possible run queries directly in the database instead of in Ruby.

spickermann
  • 100,941
  • 9
  • 101
  • 131
  • I suppose that technically speaking User.where(id: 1) returns an ActiveRecord collection proxy? But spot on for find being the correct method. – David Aldridge Sep 07 '16 at 07:38
  • @DavidAldridge: You are right `where` returns a ActiveRecord Relation (not an array). In this example that doesn't make a huge difference, but in other examples it might be important to be more precise. – spickermann Sep 07 '16 at 07:43
  • So what if it's not just the `id` Im after? That's why I use `.where` in the example. – Sylar Sep 07 '16 at 08:00
  • @Sylar: The `find` version only work when you are using the primary key. The `find_by` should be used when you are only interested in *one* (the first matchin) record (`find_by` is a short cut for `where(...).first`). `where` will always be faster that `all.detect` for multiple records. Don't forget to add an index when you query tables on columns other than the `id` on a regular base. – spickermann Sep 07 '16 at 08:23
  • Great. So `find_by` for email would be faster than `where`? Email would be unique. – Sylar Sep 07 '16 at 08:26
  • 1
    Yes, `find_by(email: ...)` would be the fastest solution when emails are unique in your database. Please ensure to add an uniq index to your database column and do not only rely on Rails uniqueness validations. – spickermann Sep 07 '16 at 08:39
1

NB One should use ActiveRecord#find in this particular case, please refer to the answer by @spickermann instead.

User.where is executed on DB level, returning one record.

User.all.detect will return all the records to the application, and only then iterate through on ruby level.

That said, one must use where. The former is resistant to an amount of records, there might be billions and the execution time / memory consumption would be nearly the same (O(1).) The latter might even fail on billions of records.

Aleksei Matiushkin
  • 119,336
  • 10
  • 100
  • 160
0

Here's a general guide:

Use .find(id) whenever you are looking for a unique record. You can use something like .find_by_email(email) or .find_by_name(name) or similar (these finders methods are automatically generated) when searching non-ID fields, as long as there is only one record with that particular value.

Use .where(...).limit(1) if your query is too complex for a .find_by query or you need to use ordering but you are still certain that you only want one record to be returned.

Use .where(...) when retrieving multiple records.

Use .detect only if you cannot avoid it. Typical use cases for .detect are on non-ActiveRecord enumerables, or when you have a set of records but are unable to write the matching condition in SQL (e.g. if it involves a complex function). As .detect is the slowest, make sure that before calling .detect you have used SQL to narrow down the query as much as possible. Ditto for .any? and other enumerable methods. Just because they are available for ActiveRecord objects doesn't mean that they are a good idea to use ;)

Sprachprofi
  • 1,229
  • 12
  • 24
  • Hi and thanks for your support. This question was 3 years ago. Are you sure `find_by_` was implemented then? – Sylar Nov 21 '19 at 13:34