0

I have a MariaDB InnoDB table with several million rows, but with short, fixed-width rows consisting of numbers and timestamps only. We usually search, filter and sort the rows using any of the existing columns.

We want to add a column to store an associated "url" to each row. Ideally every row will have it's url. We know for a fact that we won't be sorting, searching and filtering by the url column. We don't mind truncating the URL to it's first 255 bytes, so we are going to give it the VARCHAR type. But of course that column's width would be variable. The whole record will become variable-width and the width of the original record will double in many cases.

We were considering the alternative of using a different, secondary table for storing the varchar. We could join them when querying the data, or even more efficiently -probably- just fetch the url's for the page we are showing.

Would this approach be advisable? Is there a better alternative that would also allow us to preserve performance?

Update: As user Bill Karwin noted in one comment below, InnoDB does not benefit from fixed width as much as MyISAM does, so the real issue here is about the size of the row and not so much about the fixed versus variable width discussion.

Sebastián Grignoli
  • 32,444
  • 17
  • 71
  • 86
  • 1
    Using a secondary table is fine - whether or not there truly is a "performance impact" with a singular table. Several million is still pocket change - use performance tests and metrics to decide if the extra focus/effort is even justified. What are the "key" options? Those would naturally (hah) be suited to establish a 1:1 relationship. – user2864740 Jun 11 '20 at 20:41
  • 1
    @user2864740 The problem with the several million scenario is that it's often joined with other tables that also have millions of rows. I'm not sure I get what you mean by the "key" options, but the PK I was considering for the secondary table is just a copy of the first table's PK. It would be both PK and FK (I guess a non-autoincrement is allowed as a PK, right?). – Sebastián Grignoli Jun 11 '20 at 20:48
  • 3
    InnoDB does not care about fixed-width versus variable-width rows. There's some advantage of fixed-width rows in MyISAM, but not InnoDB. – Bill Karwin Jun 11 '20 at 20:52
  • It comes down to the indexes used on the joins. While there might be some additional disk utilization ("less efficient storage of data, per particular queries"), the performance-complexity of the query determines on how much it can reduce the reads. An unfiltered million x million cross product is going to be pretty bad in most cases - row size primarily affects the 'C' constant. It may be worthwhile exploring [using covering indices](http://www.unofficialmysqlguide.com/covering-indexes.html#:~:text=Covering%20Indexes,index%20without%20accessing%20table%20rows.) instead of manual separation. – user2864740 Jun 11 '20 at 20:55
  • Anyway, if decided to do a 1:1, one method is to PK <-> PK-which-is-also-FK. Of course it might not always make sense to use the PK (if there is a more ideal domain CK), nor is it necessary to share the PK/FK in the additional table (the PK would have the value inserted manually if shared; FK in either case). Using the PK/FK approach is beneficial in that then the singular 'index' is the table layout itself. – user2864740 Jun 11 '20 at 20:59
  • The problem I see with covering indexes is that I'll be needing one per column, because all columns are searchable and combined indexes only allow for left to right search criteria to be usable (cannot use index to search for the second column included in it). And covering indexes could multiply the partial copies of the table that the engine will be maintaining. In my mind my use case would be: joining everything with no "url" column involved, then pick the 20 (maybe 100) records for current page, then only look for the urls for those 100. – Sebastián Grignoli Jun 11 '20 at 21:14
  • @BillKarwin Turns out fixed width benefits in InnoDB are only marginal, as explained here: https://stackoverflow.com/questions/21596981/does-innodb-have-static-tables#:~:text=InnoDB%20does%20have%20fixed%2Dwidth,re%2Duse%20somewhat%20more%20efficient. I found this thanks to BillKarwin's comment above. Who knew! – Sebastián Grignoli Jun 11 '20 at 22:55
  • 2
    There are some edge cases where there's a benefit to separating some columns that are bulky and seldom-used. But you'd have to have a database with very high scale or very high rate of traffic before it would be worth the extra complexity. And it has nothing to do with fixed-width vs. variable-width. – Bill Karwin Jun 11 '20 at 23:25
  • It was not until the 6th Comment that the most important characteristic was specified: "all columns are searchable". There are many Q&A about such. Each lists a specific `CREATE TABLE` and lists likely `SELECTs`. Then the discussion centers around what kludges will help. (There is usually not a "good" solution.) – Rick James Jun 13 '20 at 22:30

1 Answers1

2

Assuming you have control over how the URL is generated, you may want to change it to a fixed-length state. Youtube videos' URIs, for instance, are always 11 characters long and base-64. This fixes the variable length problem and avoids joining tables.

If changing URI generation is not an option, you have a few alternatives to make it fixed-length:

  1. You could fill in the blanks with a special character to force every url to be 255 within the database, and removing it just before returning it. This is not a clean solution but makes DQL operations faster than joining.
  2. You could fetch the url as you have stated, but beware that two http requests may be more time consuming than any other option with just one request.
  3. You could join with another table only when the user requires it, as opposed to it being the default.

Consider that having variable length may not be as big a problem, depending on your needs. The only issue might be if you're grossly oversizing fields, but it doesn't seem to be your case.