6

I don't know enough about databases to find the right words to ask this question, so let me give an example to explain what I'm trying to do: Suppose I want the primary key for a table to be an ID I grab from an API, but the majority of those API requests result in 404 errors. As a result, my table would look like this:

I also don't know how to format a table-like structure on Stack Overflow, so this is going to be a rough visual:

API_ID_PK | name
------------------
1         | Billy
5         | Timmy
23        | Richard
54        | Jobert
104       | Broccoli

Is it okay for the ID's not to be sequentially separated by 1 digit? Or should I do this:

ID PK |  API_ID   | NAME
----------------------------------------
1     |    1      | Billy
2     |    5      | Timmy
3     |    23     | Richard
4     |    54     | Jobert
5     |    104    | Broccoli

Would the second table be more efficient for indexing reasons? Or is the first table perfectly fine? Thanks!

Daniel Widdis
  • 8,424
  • 13
  • 41
  • 63
  • ya i'm pretty sure ID's are automatically incremented by one – Caffeinated Nov 18 '15 at 22:01
  • 1
    @Coffee: IDs are automatically incremented by one **only by default**. This can be changed, however. See my answer. – Ayush Nov 18 '15 at 22:02
  • @tobogganjester: Shouldn't matter. For all the DB knows is you had all consecutive, but deleted 2...4, 6...22, 24...53, etc. (though if there are PKs from another database, you could put your mind at ease and make your own PK, then specify these as a `tmdb_id` column.) – Brad Christie Nov 18 '15 at 22:07

3 Answers3

9

No, there won't be any effect on efficiency if you have non-consecutive IDs. In fact, MySQL (and other databases) allow for you to set a variable auto_increment_increment to have the ID increment by more than 1. This is commonly used in multi-master setups.

Ayush
  • 41,754
  • 51
  • 164
  • 239
  • What if it's an arbitrary sequence? OP's IDs above don't have any pattern at all, just random – Caffeinated Nov 18 '15 at 22:03
  • 1
    Doesn't matter, the only requirement for IDs is that they be unique. – TMN Nov 18 '15 at 22:10
  • 3
    @Coffee: The ID is usually used to create a binary-tree of some sort. Thus, as TMN states, the only requirement is that they be unique. – Ayush Nov 18 '15 at 22:11
2

It's fine to have IDs not sequential. I regularly use GUIDs for IDs when dealing with enterprise software where multiple business could share the same object and they're never sequential.

The one thing to watch out for is if the numbers are the same. What's determining the ID value you're storing?

Paurian
  • 1,372
  • 10
  • 18
  • I'm dumping movie data from Themoviedb, so each tmdb id is unique, just a lot of them end up being deleted by tmdb, and their policy is to keep that ID unused. I was just worried about efficiency if it's not 1 2 3 4 5 etc. – tobogganjester Nov 18 '15 at 22:06
  • It sounds like you're fine. ID mapping like what you're experiencing is common. You'll want to handle cases their API throws at you, such as if they return lists sorted by last-modified date (which could return the same record to you in successive calls). MySQL handles this with "on duplicate key update". Good Luck! – Paurian Nov 18 '15 at 22:15
1

If you have a clustered index (Sql-Server) on a ID column and insert IDs with random values (like Guids), this can have a negative effect, as the physical order of the clustered index corresponds to the logical order. This can lead to a lot of index re-organisations. See: Improving performance of cluster index GUID primary key.

However, ordered but non consecutive values (values not separated by 1) are not a problem for clustered indexes.

For non-clustered indexes the order doesn't matter. It is okay to insert random values for primary keys as long as they are unique.

Community
  • 1
  • 1
Olivier Jacot-Descombes
  • 104,806
  • 13
  • 138
  • 188