0

I've just discovered that all but VAR data types do not vary their LENGTHs. This strikes me as extremely inefficient since there may be INT fields like an autoincrement PRIMARY that never change where a 1 in a BIGINT column will take the full 8 bytes.

I have many columns that won't change once they're inserted but have the potential to be large. Instead of using all BIGINT columns, I'd like to use VARBINARY instead.

I want to manipulate the data as an int in c++ yet store it as VARBINARY in mysql.

How can these conversions be made in c++?

  • 2
    Don't do this. Use the right data type for each field. – Matt Ball Mar 21 '13 at 04:36
  • 2
    Are you ready to take a severe performance hit on both legs of the conversion, an inability to aggregate your integer data "the usual way" and severe pains while sorting your `int`s? That's the lesser of your worries, because the readability is going to plunge: the readers of your code would be wondering what were you thinking. Are you willing to do all that to save on a few measly bytes of disk space, which is dirt-cheap these days? – Sergey Kalinichenko Mar 21 '13 at 04:37
  • @dasblinkenlight which is slower, grabbing 2 columns of a billion row table at max space efficiency, or converting all `VARCHAR`s to `bit`s to `int`s? this is a semi-nosql setup –  Mar 21 '13 at 04:43
  • @JoeCoderGuy The answer depends a lot on your network setup, because that's likely going to be your slowest link, but good chances are that the two would be comparable to each other. If anything, I'd use [`varbinary`](http://dev.mysql.com/doc/refman/5.0/en/binary-varbinary.html). – Sergey Kalinichenko Mar 21 '13 at 04:52
  • @dasblinkenlight I think you should post the `varbinary` part in an answer. I was going about this ass backwards. lol Do you know how to convert `int` to `BINARY` and back again in c++? –  Mar 21 '13 at 05:05
  • @JoeCoderGuy I don't know how to do it in C++, that's why I didn't suggest this in an answer. It should be similar to converting byte arrays to `int`s, though. – Sergey Kalinichenko Mar 21 '13 at 05:15

2 Answers2

3

I would highly recommend against this.

If you look at the MySQL documentation for VARCHAR, you'll see that

VARCHAR values are stored as a 1-byte or 2-byte length prefix plus data. The length prefix indicates the number of bytes in the value.

Let's take a 64-bit value like 0xDEADBEEF2B84F00D. If you store that in a BIGINT field, it takes 8 bytes (it's stored as a binary value.)

If you store that as a VARCHAR, even if you assume the numbers are always hexadecimal, we're talking 1 or 2 bytes for the length field, and then another 16 bytes for the hexadecimal characters! 18 bytes, and the performance lost by having to convert to/from ASCII, this is a terrible trade-off!

Let's look at the opposite case, 0x01. Stored in a BIGINT field, it's still 8 bytes. Using the same convention from before, it's probably going to be 3 bytes in your database. Not even close to being worth it, in my opinion. Especially when you consider how poorly the database would perform with a bunch of ASCII data.

Just don't do it. Use the database the way it was designed.

Jonathon Reinhart
  • 132,704
  • 33
  • 254
  • 328
  • what if all ascii characters were used as `bit`s yet manipulated as `int`s (or simply kept as `bit`s in c++) and vice versa? http://en.wikipedia.org/wiki/ASCII#ASCII_printable_characters –  Mar 21 '13 at 04:45
  • I don't know what you're suggesting. `VARCHAR` stores ASCII characters. How you wish to interpret those characters is up to you. Binary uses only "0" and "1". Decimal uses "0" thru "9", and Hexadecimal uses "0" thru "F". – Jonathon Reinhart Mar 21 '13 at 04:48
  • I think you're suffering from the common misunderstanding about the difference between binary data, and how it's represented textually. This is a very fundamental thing that you must have a solid understanding of, or you're going to be very confused. – Jonathon Reinhart Mar 21 '13 at 04:50
  • @JoeCoderGuy I would look at [this question](http://stackoverflow.com/questions/2481004/why-arent-exes-in-binary), and understand the answers to it. – Jonathon Reinhart Mar 21 '13 at 04:53
  • dasblinkenlight showed me that I should be using `varbinary` instead. Can I simply dump the output from that field into an `int` with connector c++? –  Mar 21 '13 at 16:45
  • 1
    @JoeCoderGuy That I unfortunately don't know the answer to. Perhaps asking the MySQL Connector C++ mailing list would know. – Jonathon Reinhart Mar 21 '13 at 20:13
1

I think this is a losing proposition. Unless you can show that your data simply will not fit on your system without resorting to kludges, I think you're better off sticking with integer types for integer values. Otherwise, what's to prevent someone from inserting "hello" in your numeric ID column, and how happy will you be querying numbers that don't look like numbers? It'll probably be slower overall too.

John Zwinck
  • 239,568
  • 38
  • 324
  • 436
  • dasblinkenlight showed me that I should be using `varbinary` instead. Can I simply dump the output from that field into an `int` with connector c++? –  Mar 21 '13 at 16:45
  • I don't know if you *can* but I know that you *shouldn't*. This way lies madness. Even the person who mentioned `varbinary` listed a bunch of caveats. I'd spend your time optimizing something else. – John Zwinck Mar 22 '13 at 10:54