85

Is there going to be much benefit in indexing a boolean field in a database table?

Given a common situation, like "soft-delete" records which are flagged as inactive, and hence most queries include WHERE deleted = 0, would it help to have that field indexed on its own, or should it be combined with the other commonly-searched fields in a different index?

vvvvv
  • 25,404
  • 19
  • 49
  • 81
nickf
  • 537,072
  • 198
  • 649
  • 721
  • Possible duplicate of [Is there any performance gain in indexing a boolean field?](http://stackoverflow.com/questions/10524651/is-there-any-performance-gain-in-indexing-a-boolean-field) – Amir Ali Akbari Apr 13 '16 at 12:14
  • 32
    @AmirAliAkbari: Oh! No! A circular reference! Hopefully S.O. won't explode! – Paul Feb 01 '17 at 14:49
  • 2
    WAIT! This one existed first! The other one was created after, which means this one is NOT its duplicate. S.O. is saved! – OCDev Nov 25 '22 at 14:33

6 Answers6

60

No.

You index fields that are searched upon and have high selectivity/cardinality. A boolean field's cardinality is obliterated in nearly any table. If anything it will make your writes slower (by an oh so tiny amount).

Maybe you would make it the first field in the clustered index if every query took into account soft deletes?

Mark Canlas
  • 9,385
  • 5
  • 41
  • 63
  • 8
    imagine a big book with thousands of pages. Pages contain a single letter, 'A' or 'B' and a random number. Would you have a benefit in finding a certain random number entry for which you know that it is on one of the 'A' pages when A and B pages are not mixed but the book starts with A pages only and then B? Yes you would.. so I guess you're wrong.. – tObi Sep 25 '16 at 12:29
  • 2
    Are you sure this is right? I could easily see such a field being of value if, for example, 99% of the time the value was 'no' and you were querying to only the 'yes' values. (E. G. only active records?) – RonLugge May 13 '19 at 21:58
  • 4
    I think the answer is too simplistic, given many other indexing strategies in modern databases. For example, a partial index `WHERE field = false`, or certain other non-btree indexes which are usually platform-specific, provide alternatives to a btree for searching booleans. It also depends on your search conditions and what portion of the table is true vs. false. – DB140141 Jun 26 '19 at 15:49
  • 2
    This answer makes a strange assumption and is wrong for some cases. A soft-delete system might be expected to have a very unbalanced number of deleted=1 vs deleted=0. If a lookup is performed across millions of records and all but a thousand have been deleted, and if you are interested in fetching those thousand items that have not been deleted yet, then a where deleted=0 would give a very huge performance increase. A full table scan through millions of records would be way slower than a quick index that finds just the 1000 records. – OCDev Nov 25 '22 at 16:16
18

What is about a deleted_at DATETIME column? There are two benefits.

  1. If you need an unique column like name, you can create and soft-delete a record with the same name multiple times (if you use an unique index on the columns deleted_at AND name)
  2. You can search for recently deleted records.

You query could look like this:

SELECT * FROM xyz WHERE deleted_at IS NULL
jhlllnd
  • 716
  • 6
  • 10
7

I think it would help, especially in covering indices.

How much/little is of course dependent on your data and queries.

You can have theories of all sorts about indices but final answers are given by the database engine in a database with real data. And often you are surprised by the answer (or maybe my theories are too bad ;)

Examine the query plan of your queries and determine if the queries can be improved, or if the indices can be improved. It's quite simple to alter indices and see what difference it makes

Brimstedt
  • 3,020
  • 22
  • 32
  • 3
    @OMGPonies The harm is in the additional write overhead, on a busy table with a lot of rows, this can actually reduce query performance. It's only a benefit when there is high cardinality and the queries are built to take advantage. – oucil Dec 10 '13 at 20:14
2

I think it would help if you were using a view (where deleted = 0) and you are regularly querying from this view.

Adriaan Stander
  • 162,879
  • 31
  • 289
  • 284
2

i think if your boolean field is such that you would be referring to them in many cases, it would make sense to have a separate table, example DeletedPages, or SpecialPages, which will have many boolean type fields, like is_deleted, is_hidden, is_really_deleted, requires_higher_user etc, and then you would take joins to get them.

Typically the size of this table would be smaller and you would get some advantage by taking joins, especially as far as code readability and maintainability is concerned. And for this type of query:

select all pages where is_deleted = 1

It would be faster to have it implemented like this:

select all pages where pages 
inner join DeletedPages on page.id=deleted_pages.page_id 

I think i read it somewhere about mysql databases that you need a field to at least have cardinality of 3 to make indexing work on that field, but please confirm this.

p.campbell
  • 98,673
  • 67
  • 256
  • 322
umar
  • 4,309
  • 9
  • 34
  • 47
  • 2
    It's hard to say given that a boolean is so thin and we don't have any data, but incurring a join and its workflow every single query would make queries slower, not faster, especially if the primary keys were clustered differently and if the deleted_pages table was needed for every single query. – Mark Canlas Dec 04 '09 at 05:51
0

If you are using database that supports bitmap indexes (such as Oracle), then such an index on a boolean column will much more useful than without.

Rondo
  • 3,458
  • 28
  • 26
  • Watch out with bitmap indexes on tables with frequent updates https://www.oracletutorial.com/oracle-index/oracle-bitmap-index/ – Joost Lambregts May 30 '22 at 12:56
  • I can confirm that this cause of table deadlock is real. :) We were able to change the index type at the time without any downside for a non-boolean name-value lookup table to avoid it. But yea, real concern. – Rondo Jun 17 '22 at 18:24