0

I'm going to use a very large MySQL database which I have to optimize as good as possible. I read that using fixed width's of the columns is a good way to optimize the MySQL performance, logically. For instance, a CHAR is better than a VARCHAR or BLOB. However, does MySQL perform better when the values within these static width columns are all the same width, e.g. filling the whole column width?

For example, let's say I have a table structure like

|ID(BIGINT) |Integer (CHAR[5])|Integer2(CHAR[5])|etc...
|1          |45896            |47736            |etc...
|2          |577              |77               |etc... <-- is such a row bad practice looking at performance?
|3          |47732            |85773            |etc...

E.g. woud it be better to pad the short values like shown below? (I have indexes on my tables to optimize. I am looking for further optimization, since it am getting in the segment of large MySQL tables?)

|ID(BIGINT) |Integer (CHAR[5])|Integer2(CHAR[5])|etc...
|1          |45896            |47736            |etc...
|2          |00577            |00077            |etc...
|3          |47732            |85773            |etc...

The tables I'm going to use will consist of 250+ million rows with multiple Integer columns like shown above. I defined the field as a CHAR, since I will store the integers in bytes myself. I will save you the reason for this, but this approach is needed for my application to work properly. So, is it good, looking at MySQL performance, to pad a value which doesn't match the column width, so all the values within a fixed width column have the same width?

Width can also be read as length?

I hope my question is clear. When it isn't, please let me know and I will try to explain my question further.

Summary: How do the value width's within the fixed column width's affect MySQL performance looking at tables with over 250 million rows? E.g. is it smart to pad smaller integers to the column width to optimize the performance?

TVA van Hesteren
  • 1,031
  • 3
  • 20
  • 47
  • Although you said you had your reasons to use strings, just a warning: if you store `77` as a string, and you sort it, it will be sorted after `100` (as "7" > "1", and it is a lexigraphical order); this could be a reason to pad 0s to prevent such problems. That applies to range scans too (e.g. `where integer2 between '1' and '3'` will include `10000`, but not `00001`). Also calculations (adding or substracting numbers) will require typecasting and might prevent the use of indexes (`where integer2+1 = 10`). And some other things. It will depend on your use cases if these will be a problem. – Solarflare Sep 10 '17 at 20:46
  • First do the cleanups suggested, then provide us with the queries and `SHOW CREATE TABLE` so we can take you through the next steps. – Rick James Sep 11 '17 at 00:14
  • `CHAR(5)` takes 5 bytes; `MEDIUMINT UNSIGNED` takes 3, but hold a similar number range. `CHAR` is wasting space! – Rick James Sep 11 '17 at 00:16
  • @Solarflare, the column values will not be used for such operations. Thanks for the reminder though – TVA van Hesteren Sep 11 '17 at 08:56
  • @RickJames, right. I use `Binary` right now, since I have to store multiple blocks of 5 bytes each in one column. I pad values so all integers result in 5 bytes (resulting in max 2.1billion results for me) – TVA van Hesteren Sep 11 '17 at 08:58
  • `INT` is only 4 bytes; why use 5? – Rick James Sep 11 '17 at 16:45
  • @RickJames, because I will use `multiple blocks of 5 bytes` in one column. Because the width is `5 bytes` I can always get back to the original numbers which is required. – TVA van Hesteren Sep 12 '17 at 08:17

1 Answers1

0

Tables don't have performance—queries have performance.

Tables are just storage. You might be concerned about the performance of inserting data, or updating, or various kinds of SELECT queries. But you haven't described any queries, so there's no way to know what you want to optimize for.

All optimizations improve performance for one type of data access at the expense of other types of data access.

Fixed width

The whole "fixed width is faster" optimization is only for the MyISAM storage engine, and only if all the columns in a table are fixed-width (so the whole row is fixed-width). It is described here: https://dev.mysql.com/doc/refman/5.7/en/static-format.html

InnoDB has no performance advantage for fixed-width rows. You should use InnoDB because it's now the default storage engine, it supports good features for performance and scalability of concurrent requests:

  • Row-level locking
  • Readers don't block writers
  • Caching data and index pages in RAM

Integer vs. Char/Varchar

You wrote:

I defined the field as a CHAR, since I will store the integers in bytes myself

Then I guess you mean you're storing as BINARY which is like CHAR but without going through character set encoding.

This is exactly what the INT and BIGINT types do. They convert integer values to binary and store in 4 bytes or 8 bytes respectively. Unless you need to store excessively large integers (greater than 264-1), you're wasting your time.

If you're only storing integer values, don't store them in strings. This uses at least twice the storage space compared to storing in binary data types INT or BIGINT. More storage for the value requires more RAM for caching. If you can store more compactly, your RAM will be used more effectively because it will cache more rows in the same amount of RAM.

That will help performance much more than the micro-optimization of choosing fixed-width vs. variable-width data types.

Before you ask, you should know that INT(4) and INT(8) are exactly the same. The number argument looks like it might affect the size or the maximum length of the integer, but it doesn't. See my answer to Types in MySQL: BigInt(20) vs Int(20)

More important optimizations

And all of the optimizations described above have insignificant performance benefits compared to proper index design. You can only design indexes to support specific queries you want to optimize.

Bill Karwin
  • 538,548
  • 86
  • 673
  • 828