34

As the title says, I wonder, why ASP.NET Identity 2.0 uses a string with a GUID as primary clustered key for the user table. Does this have any advantages to an integer id? I see only the problem, that a GUID isn't the best choice for a clustered index.

Am I missing anything or is a integer still the better choice?

  • Microsoft always chooses a Guid for a primary Key, you're right as it is not the fastest in clustered keys, but Microsoft seems to love them - as they're the ones that created them – Scott Selby May 27 '14 at 13:59
  • You can look at this answer: https://stackoverflow.com/a/24152085/3901618 – romanoza Oct 13 '17 at 12:00

5 Answers5

25

Regarding the use of guid, there is a point of view that promotes using ids without "meaning" in order to completely separate the identifier from the data surrounding it; this id shouldn't be visible from outside the datastore. If we look at some characteristics of a surrogate key, we have the following

  • the value is unique system-wide, hence never reused
  • the value is system generated
  • the value is not manipulable by the user or application
  • the value contains no semantic meaning
  • the value is not visible to the user or application
  • the value is not composed of several values from different domains.

So a guid fits the bill since it is indeed generated by the system and has no relationship to the domain. I think that the use of a guid is mainly a question of trend in this particular way of thinking; however since they are introducing a new mechanism of "extensible primary key" the key can be changed, so you can fallback on an integer for your PK.


Regarding performance i'd point you to this thread where the accepted answer says:

GUIDs may seem to be a natural choice for your primary key - and if you really must, you could probably argue to use it for the PRIMARY KEY of the table. What I'd strongly recommend not to do is use the GUID column as the clustering key, which SQL Server does by default, unless you specifically tell it not to.

You really need to keep two issues apart:

  • the primary key is a logical construct - one of the candidate keys that uniquely and reliably identifies every row in your table. This can be anything, really - an INT, a GUID, a string - pick what makes most sense for your scenario.
  • the clustering key (the column or columns that define the "clustered index" on the table) - this is a physical storage-related thing, and here, a small, stable, ever-increasing data type is your best pick - INT or BIGINT as your default option.

which completely confirm your impression.

Community
  • 1
  • 1
samy
  • 14,832
  • 2
  • 54
  • 82
6

The other answers are excellent, however one advantage I haven't seen mentioned is that Guid.NewGuid() (theoretically) creates a unique ID without committing the row to the database.

An integer based identity column requires a database flush to get it's ID. There are some circumstances where it's useful to have the PK for your row generated in code and passed to the database (obviously there are other ways of achieving this with with a unique constraint but a Guid is a reasonably good option).

Mariusz Jamro
  • 30,615
  • 24
  • 120
  • 162
Liath
  • 9,913
  • 9
  • 51
  • 81
  • 2
    Indeed, that is a pro of the surrogate key in the article I linked - it can be determined outside of the database (which is another way of saying that it is system-generated). However an int value can be generated outside of the DB too; what gives the guid an edge is its unicity. +1 for being way more concise than me :D – samy May 27 '14 at 14:44
  • 1
    It's especially useful if you're importing lots of complex data. You can prepare all the keys in advance. It's also very useful for data synchronization in general (if two rows have the same ID, they are the same row; with `identity`, you need another piece of information for that). Given the flow towards more distributed architectures, GUID seems like a natural choice - especially if you add the fact that you might want to have identities shared over different applications with their own databases; a unique user ID is dependable (like Windows SIDs). – Luaan Aug 21 '15 at 07:35
4

As usual Microsoft thinks of "simplicity" more rather than being actually useful and performant. Yes GUID is system wide unique, yes you can test it without flushing new ID from the database....BUT...

I'm not even going into debate of DB indexes which I'm sure many thought of. Here is another downside to strings. Even if I use BIGINT in the db as the userid, it's still only 8 bytes vs varchar(128). I understand that disk space is cheap these days but why do I have to clutter my db with unnecessary stuff. Anyone who ever worked on any project that involves millions of users will frown upon strings as their userid. All responses from Microsoft as to "why they used strings" are basically brush off responses.

Of course they will say: "you can change it to use INTs, BIGINTs etc...." Sure, you have to change myriad of classes, implement your own user store etc. Right...something so simple has to be so complicated.

/end rant

ShyGuy82
  • 115
  • 2
  • 6
2

I assume that by "to an id" you mean a sequential integral ID, as a UUID is after all an ID.

If absolutely nobody using Identity 2.0, or a future version, ever wanted to merge together, or combine more than one store, or ever import users and/or rôles, then a numerical ID would work.

If just a few people do, or even might, then a UUID makes a lot more sense.

There's an argument to be made for using a natural key, such as a username, with the pros and cons of that being well-explored generally. IIRC, they did indeed do this the first time.

In all, a UUID seems an obvious choice.

Jon Hanna
  • 110,372
  • 10
  • 146
  • 251
  • 1
    This is a good point and makes even more sense in the context of user identities, which can be numerous and spread across different identity providers. Merging user databases would be trivial if both used UUIDs. – parliament Nov 12 '14 at 04:59
2

A Guid as a UserId (thereby primary key) can be your best friend at times.

Something to consider when folks "rant" against a GUID is they may not be considering "occasionally connected" applications whereby you can't hit the database to get your "int" identity field value until your reconnect but need to create multiple records in various tables linked together offline. Having a Guid allows the application create a "user" and therefore the "userId" as the primary key without collisions when you sync when back online.

To avoid performance hits, don't put the Guid in the cluster index. You can always use a different unique field as the cluster id or maybe even use an integer identity field increased by one as a "clusterId" and use that instead.

Another reason is when you are aggregating data from many different sources, if the UserId (primary key) is a Guid you can avoid collisions.

Blake
  • 309
  • 3
  • 10