Sanity Check: Floats as primary keys?

Question

I'm working with an old sql server 2000 database, mixing some of it's information with a new app I'm building. I noticed some of the primary keys in several of the tables are floats rather than any type of ints. They aren't foreign keys and are all unique. I can't think of any reason that anyone would want to make their unique primary key IDs floats, but I'm not a SQL expert by any means. So I guess what I'm asking is does whoever designed this fairly extensive database know something I don't?

I believe your instincts are 100% correct here; floats make bad keys. — Carl Manaster, Jul 09 '09 at 17:36

score 8 · Answer 1 · answered Jul 09 '09 at 17:57

I worked with someone who used floats as PKs in SQL Server databases. He was worried about running out of numbers for identifiers if he stuck to INTs. (32 bit on SQL Server.) He just looked at the range of float and did not think about the fact never mind that for larger numbers the ones place is not held in the number, due to limited precision. So his code to take MAX(PK) + 1.0 would at some point return a number equal to MAX(PK). Not good. I finally convinced him to not use float for surrogate primary keys for future databases. He quit before fixing the DB he was working on.

To answer your question, "So I guess what I'm asking is does whoever designed this fairly extensive database know something I don't?" Most likely NO! At least not with respect to choosing datatypes.

The number where you start to run into the problem `PK+1.0=PK` is 9007199254740992. At 1000000 new PK per second, it will take you 285 years to run into that number. I don't think this is a point worth worrying about. On the other hand, at only 100 per second you can run out of room in a signed int in less than a year. — Mark Ransom, Oct 07 '16 at 16:28

score 8 · Accepted Answer · answered Jul 09 '09 at 18:51

8

I'm currently working with a rather big accountant package where EACH of 350+ tables has a primary key of FLOAT(53). All actual values are integers and the system strictly checks that they indeed are (there are special functions that do all the incrementing work).

I did wonder at this design, yet I can understand why it was chosen and give it some credits. On the one hand, the system is big enough to have billion records in some tables. On the other hand, those primary keys must be easily readable from external applications like Excel or VB6, in which case you don't really want to make them BIGINT.

Hence, float is fine.

answered Jul 09 '09 at 18:51

GSerg

76,472
17
159
346

That's interesting, can you explain what you mean by easily readable? Do you mean its easier for the external applications to read them or human readable? Thanks for the response. – Graham Conzett Jul 09 '09 at 19:09
1

No, I mean those systems with which people are writing their own little helper applications around the big thing, can easily be not able to cope with 64-bit integers. There is no data type in VB6 to natively suppor a BIGINT, you'd have to play with Double (as Excel suggests automatically) or Variant-Decimal (as ADO does). And that's at least something! I can easily imagine an external system that will not be able to consume BIGINTs at all. – GSerg Jul 09 '09 at 20:16

score 6 · Answer 3 · answered Jul 09 '09 at 17:23

6

Floats have one interesting property: it is always possible to insert a value between two others, except for the pathological case where you run out of bits. It has the disadvantage that representational issues may keep you from referring to a row by the key; it's hard to make two floating point values equal to each other.

answered Jul 09 '09 at 17:23

Mark Ransom

299,747
42
398
622

2

As a matter of principle, if you want these properties I would think you could always add an indexed column containing the float, in addition to the int primary key. – Robert Harvey Jul 09 '09 at 17:36
That was my first thought, but I can't find any like that, they're all sequential 3.0, 4.0, 5.0 etc. perhaps this was some sort of future proofing, although I'm not quite sure why since they just seem to be IDs. Thanks for your input. – Graham Conzett Jul 09 '09 at 17:44
2

It doesn't sound like this is the right answer for the questioner's specific case...but you get voted up anyway for having an interesting and possible answer for the next person to search for this question. – Beska Jul 09 '09 at 18:23

Matt Rogish · Answer 4 · 2009-07-09T18:18:27.463

2

Is it a NUMERIC( x, y) format and an IDENTITY? If so, it might be an upgrade from an older version of SQL Server. Back-in-the-day IDENTITY could only be a NUMERIC format, not the common INT we use today.

Otherwise, there's no way to tell if a float is suitable as a primary key -- it depends upon your domain. It's a bit harder to compare (IEEE INT is more efficient than float) and most people use monotonically increasing numbers (IDENTITY), so integers are often what people really want.

Since it looks like you're storing ints:

To answer the original question more directly: If you're storing ints, use the integer datatype. It's more efficient to store and compare.

edited Jul 09 '09 at 18:18

answered Jul 09 '09 at 17:17

Matt Rogish

24,435
11
76
92

1

Actually, PKs could always been INTs but since BIGINTs did not exist, people used NUMERICs – Andriy Volkov Jul 09 '09 at 17:20
1

Ah OK. I know that Sybase SQL Server -- forever ago -- only supported NUMERICS and I thought that SQL Server's older than 6.5 inherited that restriction – Matt Rogish Jul 09 '09 at 17:27
I believe they used SQL Server 2000 from the beginning, although I'm not 100% sure. I'm going to double check though. – Graham Conzett Jul 09 '09 at 17:41
1

Quirkily MySQL used to be technically faster at comparing floats than ints (which is no longer the case, I hasten to add) so some fans of premature optimisation are presumably responsible. We have a production DB like that here, and that caused me to go and look up when I first came across a table with float as the PK. – Iain Collins Mar 22 '13 at 12:04

score 1 · Answer 5 · answered Nov 06 '13 at 18:21

I have been working with the Cerner Millenium database for a few years (under the covers it uses Oracle). Initially I was very surprised to see that it used floats for IDs on tables. Then I encountered an ID in our database > 2^32 and a query I wrote gave the wrong results because I had incorrectly cast it to an INT I realized why they did it. I don't find any of the arguments above against using floats persuasive in the real world, where for keys you just need numbers only "somewhat greater" than 2^32 and the value of the ID is always of the form ######.0. (No one is talking about an ID of the form ######.######.) However now that we are importing this data into a SQL Server warehouse, and bigint is available, we are going to go with bigint instead of float.

score 1 · Answer 6 · edited Nov 16 '15 at 15:02

1

FYI-- there is another way of looking at this:

I work in real-time process control and as such, most of my row entries are time-based and are generated automatically at high rates by non-ASCII machines. time--that's what my users usually search on, and many of my 'users' are actually machines themselves. Hence, UTC based primary keys.

edited Nov 16 '15 at 15:02

Robert

5,278
43
65
115

answered Nov 16 '15 at 14:34

george

11
1

Sanity Check: Floats as primary keys?

6 Answers6

Linked