41

Is it bad to have text as a primary key in an SQLite database? I heard that it's bad for performance reasons, is this true? And will the rowid be used as the actual primary key in such a case?

mpellegr
  • 3,072
  • 3
  • 22
  • 36

7 Answers7

50

Is it bad to have text as a primary key in an SQLite database? I heard that it's bad for performance reasons, is this true?

From correctness point of view, TEXT PRIMARY KEY is all right.

From performance point of view, prefer INTEGER keys. But as with any performance issue, measure it yourself to see if there's a significant difference with your data and use cases.

And will the rowid be used as the actual primary key in such a case?

Only INTEGER PRIMARY KEY gets aliased with ROWID. Other kinds of primary keys don't, and there will be the implicit integer rowid unless WITHOUT ROWID is specified. Reference.

laalto
  • 150,114
  • 66
  • 286
  • 303
  • 3
    what is the implicit rowid used for then? – mpellegr Apr 18 '14 at 20:45
  • For clarity could you explicitly state whether it is the implicit rowid that is referenced for TEXT primary keys? – dobbs May 29 '15 at 23:32
  • If you don't use `WITHOUT ROWID`, SQlite will use the rowids as the keys for the table's main search tree, and make a separate search tree associating the text primary keys with their rowids (because any primary key column gets a unique index). So, `WITHOUT ROWID` is usually a good idea for tables with text primary keys. I don't think the integer vs. string comparison speed will matter much for almost any SQLite table because of the index on the primary key column. (See my answer below for more details.) – Zoë Sparks Jul 26 '23 at 07:24
37

In real world, using strings as primary key has a lot of benefits if we are talking about UUIDs. Being able to create entity "passport" exactly at the moment of its creation can massively simplify asynchronous code and/or distributed system (if we are talking about more complex mobile client / server architecture).

As to the performance, I did not find any measurable difference when running a benchmark to perform 10000 primary key lookups, as in reality, database indexes neither store nor compare strings when running indexed searches.

Segabond
  • 1,083
  • 12
  • 15
  • 1
    I would say that UUIDs consists of limited number of symbols (VARCHAR). So it is somewhat multi-byte integer, you can think about it as an extension to char, word, int, long,.... Where TEXT has a variable length and is a different story. – Nikolay Nov 11 '21 at 15:47
  • 1
    @Nikolay: note that unlike other SQL engines, SQLite only has a single type for storing text data: TEXT. Names such as VARCHAR or CHARACTER are synonyms for TEXT. If you specify a maximum length such as VARCHAR(20), this limit is silently ignored. – ramirami Jan 23 '23 at 16:34
3

Although this thread discusses INTEGER vs TEXT primary keys, for context, see Blob vs. Text for primary keys circa 2021 where SQLite creator Richard Hipp replies. I've pasted and emphasized the relevant portion of his reply below.

(2) By Richard Hipp (drh) on 2021-03-04 16:00:22 in reply to 1 [source]

Both approaches should work fine. Storing the hash as a BLOB might be very slightly faster, since (as you observe) there is less content, hence less file I/O.

The Fossil version control system does something very much like this. But it stores the hash as text rather than as a blob. Performance is not an issue, and text is easier for developers to deal with when debugging.

Ken Lin
  • 1,819
  • 21
  • 21
0

There's nothing intrinsically wrong with using a text primary key. What makes a primary key work is that it is orderable and has unique values in the table; other than that, the type of the data doesn't strictly matter. However, when the data is text, often the "real-world" source of the data rules out the practicality of using it as the primary key. Using arbitrary, meaningless integers for the primary key means you don't have to worry about stuff like that.

That's probably the nicest thing about integer primary keys, arguably moreso than the speed of integer vs. string comparison. String comparison is generally a little more work for the computer than integer comparison, true, but that's unlikely to matter much in this context. SQLite creates an index for the primary key in any table, which means that even if you have a million entries, SQLite will only need to perform around 13 comparisons at worst to find the row (O(log n)). It would take a really unusual case for that to have major performance implications, I'd say.

On that note, something thing you might consider if you're planning on using a text primary key is to use SQLite's WITHOUT ROWID feature. A table with a text primary key is unlikely to need a rowid column, because the rowid is essentially an integer primary key. WITHOUT ROWID not only eliminates the rowid column, but also tells SQLite to base the search tree for the table itself on the primary key you specify instead of the rowid. Otherwise, it will create two search trees, the main search tree for the table itself using rowid keys and a separate search tree for the text primary key associating text values with rowids. This wastes space and adds needless overhead to lookups using the text primary key, presuming you have no need for the rowid.

SQLite's docs for WITHOUT ROWID explain all this stuff. They give an example of a table storing word counts in a text corpus with the word as the primary key, which seems to me like a nice example of a situation where a text primary key makes sense.

Zoë Sparks
  • 260
  • 3
  • 9
-2

Yes, if you use TEXT you get android.database.sqlite.SQLiteConstraintException: UNIQUE constraint failed: TableName.ColumnName (code 1555)

SQLite has session to insert and return the row ID of the last row inserted, if this insert is successful. else will return -1.

return is mapped to _ID , this is the reason they force you interface BaseColumns for the table

its strange that insert call has to return the rowid, instead of a boolean or so

I wish TEXT PRIMARY KEY capability was there in sqlite

Bipin
  • 334
  • 2
  • 6
-4

A field of type PRIMARY KEY implies comparing values. Comparing a number is simpler than comparing a text.

The reason is that there is a specific assembly instruction for 64 bit numeric comparison. This will always be much faster than comparing text which in theory can be unlimited in size.

Example comparing number:

CMP DX, 00  ; Compare the DX value with zero
JE  L7      ; If yes, then jump to label L7
.
.
L7: ...

Read more about CMP assembly instruction here: https://www.tutorialspoint.com/assembly_programming/assembly_conditions.htm

Knowing this allows us to know that numbers will always be more performative (at least in the computing we have today).

Sergio Cabral
  • 6,490
  • 2
  • 35
  • 37
-44

Is it bad to have text as a primary key in an SQLite database? I heard that it's bad for performance reasons, is this true?

I never heard that somebody used string as primary key in table. For me (and I honestly hope also for others) very "ugly" practise with very low performance.

If you'll use string as primary key you needs to think about a "few" things:

  • Will be combination of 3 symbols enough?
  • Or should I use 5 symbols?

Here, each row must have same format (readability issue of course) and also be unique. Oh! Here is next "piggy work" -> you'll need to create some "unique string generator" that will generate unique1 string identificator2.

And also there are next issues is good to consider:

  • Longer strings = automatically harder and harder to compare
  • Size of table radically raises because it's pretty clear that string has much more size as number
  • Number of rows - it's madness to use string as primary key if you table can have 1000+ rows

It's more complex theme but i would like to say that OK, for very small tables would be possible to use strings as primary key (if it makes a sence) but if you'll look at disadvantages it's much more better technique to use number as primary key for sure!

And what is conclusion?

I don't recommend you to use string as primary key. It has more disadvantages as advantages (it has really some advantage?).

Usage of number as primary key is much more better (I'm scared to say the best) practise.

And will the rowid be used as the actual primary key in such a case?

If you will use string as primary not.

1In real strings are rarely unique.

2Of course, you could say that you can create identificator from name of item in row, but it's once again spaghetti code (items can have same name).

Simon Dorociak
  • 33,374
  • 10
  • 68
  • 106
  • 4
    does the fact that my string comes from a server, is guaranteed to be unique, and is always a set length change this answer? – mpellegr Apr 18 '14 at 20:47
  • @mpellegr generally, an usage of string as primary key is not good practise how i wrote in my answer ;) – Simon Dorociak Apr 18 '14 at 22:32
  • 7
    Were you able to measure the "very low performance" associated with a textual primary key ? Because the answers to [this question](http://stackoverflow.com/questions/517579/strings-as-primary-keys-in-sql-database) (very similar to the current one except for the SQLite part) do not mention such an issue. – Faibbus Dec 14 '16 at 14:07
  • 1
    All good and valid points, but there is one edge case where an incrementing integer is just going to cause issues, and that is when you need to keep data in sync across multiple devices and a master data store. And GUID's are unique enough. – LordWabbit May 25 '20 at 05:25
  • 1
    Longer strings are harder to compare? This may be irrelevant if you have long strings in your table already that need to be unique. – Oddthinking Jun 28 '20 at 05:17
  • "Size of table radically raises"? If you need to store these strings in your table anyway, this is irrelevant. – Oddthinking Jun 28 '20 at 05:18
  • 1
    "it's madness to use string as primary key if you table can have 1000+ rows" That's begging the question. Why is it madness? – Oddthinking Jun 28 '20 at 05:18
  • "In real strings are rarely unique." This seems to be a bad assumption. If the string *is* guaranteed to be unique from the domain, your entire answer is undermined. – Oddthinking Jun 28 '20 at 05:19
  • 1
    I don't see anyone here providing any real reasons why this is bad practice. If you have UUIDs as the primary identifier for a table row, use it as your primary key. – Gregory Ray Jan 18 '21 at 19:17
  • 1
    Have a look at a database called CouchDB, it's a document oriented database and the document IDs (primary keys) are always saved as text. – Tiago Stapenhorst Sep 16 '21 at 14:12
  • @GregoryRay that's because it's *not* a bad practice to use strings as primary keys. It's at worse a situational engineering choice, and in fact there are many situations where it is superior to integer. "Bad practice" to me means you should never do it, or only do it where there are massive engineering tradeoffs at play. Personally I use `ULID` or `UUID6` (IETF draft at the moment) strings as primary keys at scale and have noticed nary a performance issue. But on the upside, I can generate keys at absolutely any time without server coordination. – DeusXMachina Oct 17 '22 at 23:41