4

I am working on a side project that is quite an undertaking; my question regards the efficiency gained when using a BOOLEAN value to determine whether or not further data processing is required.

For example: If I had a table that listed all the creatures. In another table that was relational in nature listed their hibernation period, and calories consumed each day during hibernation.

Is it efficient to have inside the (Creatures) table a value for "hibernates" BOOLEAN.

If true then go to the "hibernation_creature_info_relations" table and find the creature with that ID and return that information.

This means that for all the creatures whose value for "hibernates" = false will prevent SQL from having to search through the large table of "hibernation_creature_info_relations."

Or when using ID's is the process so fast in checking the "hibernation_creature_info_relations" table so fast that there will actually be a larger impact on performance by having to process the argument of doing what based on if the value of hibernation is set to true or false?

I hope this was enough information to help you understand what I am asking, if not please let me know so I can rephrase or include more details.

  • You won't gain much by having to duplicate the hibernate field in another table. any speed gains from reducing the size of the data set will be offset by having to join back to the original table in the first place. – Marc B Oct 25 '11 at 18:52
  • Also, keep in mind that, unlike imperative languages, you (usually) have very little control over the execution ordering/optimizing in SQL. That is the job of the optimizer, which can do a wide variety of things to 'try' to speed up the query - including re-ordering joins, using table-scans, switching indicies, choosing different ordering, and creating temp tables. There also isn't really any such thing as short-circuit logic, especially in the context of joins (the optimizer is free to mess with them as it sees fit). – Clockwork-Muse Oct 25 '11 at 20:43

2 Answers2

4

No, that is not a good way to do things.

Use a normal field that can be null instead.

Example

table creatures
---------------
id     name      info_id

1      dino      null
2      dog       1
3      cat       2

table info
--------------
id     info_text

1      dogs bark
2      cats miauw

Now you can just do a join:

SELECT c.name, i.info_text
FROM creature c
LEFT JOIN info i ON (c.info_id = i.id)

If you do it like this, SQL can use an index.
No SQL database will create an index on a boolean field.
The cardinality of that field is too low and using indexes on low cardinality fields slows things down instead of speeding things up.

See: MySQL: low cardinality/selectivity columns = how to index?

Community
  • 1
  • 1
Johan
  • 74,508
  • 24
  • 191
  • 319
  • @nathangonzalez maybe they do, who can know for sure? – code_burgar Oct 25 '11 at 19:00
  • +1 For onomatopoetic animal sounds not in my native language! – Michael Berkowski Oct 25 '11 at 19:06
  • @Ealianis, see the link in the article. Using an index takes time because MySQL has to read the index data from a separate file. Unless the indexes saves a lot of time elsewhere it will slow things down. If more than 20% of all rows are selected, it will be faster to **not** use an index. – Johan Oct 25 '11 at 19:06
  • I removed the comment when I saw the link, thank you for your response though it helps a lot! –  Oct 25 '11 at 19:08
  • So if over 20% of animals hibernate it is unwise to use indexing? Or if 20% of the animal are selected and returned it is unwise? –  Oct 25 '11 at 19:11
  • The percentage varies from DB to DB, but if you the a `null` link for animals that do not hibernate, you don't need the boolean field, you can just do `SELECT * FROM animals WHERE link_to_hibernate_id IS NULL` and index the link_id as normal. If very many animals do not hibernate MySQL **will refuse** to use an index though it will do a full table scan instead. You can check it out for yourself by doing `explain select ...` to see the query plan. MySQL will use an index o those animals that do hibernate, because all those links are different. – Johan Oct 25 '11 at 19:16
  • I read the article, I think I understand why selecting <20% (as you said) of the rows would be efficient in compared to a full table scan. I can only foresee for this application that I am building a very small integer of row selections at a time. Thank you very much! –  Oct 25 '11 at 19:18
0

If you want to use the column "hibernates" only to prevent the SQL from having to search through the other table then you should follow @Johan otherwise you can create index on the column "hibernates" it will improve the execution time. But keep in mind what @Johan is trying to tell you.

Yaqub Ahmad
  • 27,569
  • 23
  • 102
  • 149