-1

I'm designing DB tables for a log system. I have two ideas on my mind about a field. Should I create three "bit(1)" property or one "enum" property?

is_error bit(1)
is_test bit(1)
is_embedded bit(1)

or

boolErrors enum(is_error_true, is_error_false, is_test, is_test_false, is_embedded_ is_embedded_false)

Obviously, holding enum seems not proper in semantics and space but what about performance. Is fetching time increases when i have 3 columns instead of 1?

GMB
  • 216,147
  • 25
  • 84
  • 135
  • Look for [SET](https://dev.mysql.com/doc/refman/8.0/en/set.html) datatype. – Akina Dec 29 '20 at 11:54
  • 2
    Does this answer your question? [Practical rules for premature optimization](https://stackoverflow.com/questions/2978460/practical-rules-for-premature-optimization) – Luuk Dec 29 '20 at 11:55
  • How to decide this? Write out the queries you anticipate using on your table for both the BIT and ENUM alternatives. Pick the alternative with the easiest-to-understand queries. This is for the sake of other people working on your code. And don't waste time worrying about data size differences here, unless you know for sure you'll have billions of rows. Storage is really inexpensive. – O. Jones Dec 29 '20 at 12:08
  • Thank you @O.Jones. The problem is not about storage, I'm concerned about search performance. As I know, when you have more columns on the table search time increases is it correct? – Mehmet Namiduru Dec 29 '20 at 12:29

2 Answers2

0

If, as it seems, the flags represent states (that is, only one flags may be true at a given point in time), then I would recommend a single column, as integer datatype. Instead of using ENUM, you can use a referrential table to store all possible flags and their names, an reference it from the original table, using the integer column.

On the other hand, if several flags may be on (say, both is_error and is_test), then a single column is not sufficient. You can either create several columns (if the list of flags never changes), or use a bridge table to store each status on a separate row.

GMB
  • 216,147
  • 25
  • 84
  • 135
  • Thank you for the detailed explanation. is_error and is_test can exist both in a time. In what terms a single column is not sufficient? Enum can hold these booleans in one column, do you meant semantically insufficient but what about performance when I got trillions of data in terms of search. Am I doing micro-optimization? – Mehmet Namiduru Dec 29 '20 at 12:22
  • If the values are independent, then using different columns is peferrable; you can then query each column independently as needed. As for *trillions of data*: that's premature optimization for now. – GMB Dec 29 '20 at 12:25
  • Thank you for your suggestion. Unfortunately, our current table starts to work slowly after several years so that is why I need to make sure about the performance especially search performance. – Mehmet Namiduru Dec 29 '20 at 12:38
  • @MehmetNamiduru: I don't see how this would not be efficient. Searching for exact matches across multiple columns is fast, if you have the proper index(es) in place. – GMB Dec 29 '20 at 12:54
  • I think your suggestion is correct for both performance and semantics because 3 bit required to store is_error, is_test, and is_embedded. Due to enums hold more space full-table scan performance will increase if I use different columns for this problem but still I'm not sure how these things work in mysql. – Mehmet Namiduru Dec 29 '20 at 13:21
  • It's normal and expected for queries to slow down as tables grow. Your application is succeeding! Often the performance problem can be resolved by adding appropriate indexes on the tables. But it's best to analyze specific queries to decide what indexes to add. Please read [this](https://stackoverflow.com/tags/query-performance/info) and consider asking another question. – O. Jones Dec 29 '20 at 13:48
0

If only one of those flags can be set at a time, use ENUM.

If multiple flags can be set at the same time, use SET.

Performance is not really something to worry about. The main "cost" in working with a row in a table is fetching the row, not the details of what goes on in the columns.

Sure, "smaller is better" for several reasons -- I/O, etc. But an ENUM is 1 or 2 bytes; a SET is up to 8 bytes (for up to 64 flags). Both of those are reasonably small for any use case.

As for speed and indexability, let's see the main queries.

Rick James
  • 135,179
  • 13
  • 127
  • 222