7

It seems like a very arbitrary decision. Both can accomplish the same thing in most cases. By limiting the varchar length seems to me like you're shooting yourself in the foot cause you never know how long of a field you will need.

Is there any specific guideline for choosing VARCHAR or TEXT for your string fields?

I will be using postgresql with the sqlalchemy orm framework for python.

TheOne
  • 10,819
  • 20
  • 81
  • 119

6 Answers6

9

In PostgreSQL there is no technical difference between varchar and text

You can see a varchar(nnn) as a text column with a check constraint that prohibits storing larger values.

So each time you want to have a length constraint, use varchar(nnn).

If you don't want to restrict the length of the data use text

  • 1
    You can also use VARCHAR when you have no restriction. VARCHAR and VARCHAR(n) are two different things. – Frank Heikens Jan 16 '11 at 14:59
  • Ah, right. I always forget that varchar can be used without a length restriction as well. I usually use TEXT then. –  Jan 16 '11 at 15:02
2

This sentence is wrong:

By limiting the varchar length seems to me like you're shooting yourself in the foot cause you never know how long of a field you will need.

If you are saving, for example, MD5 hashes you do know how large the field is your storing and your storage becomes more efficient. Other examples are:

  • Usernames (64 max)
  • Passwords (128 max)
  • Zip codes
  • Addresses
  • Tags
  • Many more!
orlp
  • 112,504
  • 36
  • 218
  • 315
  • 4
    MD5 hashes would even be more efficiently stored in `CHAR(32)` columns. – BoltClock Jan 16 '11 at 13:08
  • True, but I'm giving an example where limiting lengths is not shooting yourself in the foot. But once again, true. – orlp Jan 16 '11 at 13:09
  • who said passwords should be 64 max? – TheOne Jan 16 '11 at 13:13
  • Or even 128, much longer has no valuable effect these days. – orlp Jan 16 '11 at 13:13
  • @Absolute0: That question would only apply if you're storing passwords in plain text, because most if not all hashing algos produce hashes of the exact same length. – BoltClock Jan 16 '11 at 13:18
  • Its not about the specific number, its about not knowing the future. You might have some super crazy users that would like passwords with 1000 chars long you never know... – TheOne Jan 16 '11 at 13:18
  • @Absolute0: varchar(nnn) can be seen as check constraint to validate data. Similar to `salaray numeric(12,4) check (salary > 0)` to restrict it to positive values. –  Jan 16 '11 at 13:21
  • 2
    @Absolute0: Passwords with 1000 chars will still be hashed to 40-char SHA1 checksums. – BoltClock Jan 16 '11 at 13:25
  • @BoltClock that's what he said: `because most if not all hashing algos produce hashes of the exact same length` – orlp Jan 16 '11 at 13:33
  • I know I'm late to the party, but if you're using SHA-anything for password hashes, you're [doing it wrong](http://codahale.com/how-to-safely-store-a-password/). – Stephen Touset Apr 16 '13 at 23:32
  • Zip codes could be stored more efficiently as `integer`s. – Solomon Ucko Jul 22 '18 at 21:02
  • 1
    @SolomonUcko Zip codes in my country are most certainly not integer. – orlp Jul 22 '18 at 21:03
  • Good point. You could use base 36 (+ a couple if needed) to store them (as you only need to store letters, numbers and maybe a couple symbols. If needed you could use `bigint`, but then it might be about the same storage usage either way. Base 36 would allow for 6 characters (changing signed to unsigned & with a bit of a margin). – Solomon Ucko Jul 22 '18 at 21:07
1

In brief:

  • Variable length fields save space, but because each field can have different length, it makes table operations slower
  • Fixed length fields make table operations fast, although must be large enough for the maximum expected input, so can use more space

Think of an analogy to arrays and linked lists, where arrays are fixed length fields, and linked lists are like varchars. Which is better, arrays or linked lists? Lucky we have both, because they are both useful in different situations, so too here.

davin
  • 44,863
  • 9
  • 78
  • 78
  • Most of the times we use some variation of a vector :) but I get your point. – TheOne Jan 16 '11 at 13:16
  • @Absolute0, and how do you think vector is typically implemented internally? Arrays. It's the exact same principle, if you want random access to any element, you need to know the size of every element (fixed size), otherwise, you can save space, although access requires you to move element-element, like in a linked list. – davin Jan 16 '11 at 13:28
  • 1
    This advice is not true in PostgreSQL, char/varchar/text types have exactly the same representation on disk. There's no efficiency to be gained by using char. – intgr Jan 16 '11 at 15:34
  • @intgr, good point, i wonder why that's the case. will have to dig up some postgresql code one day... – davin Jan 16 '11 at 15:53
0

In the most cases you do know what the max length of a string in a field is. In case of a first of lastname you don't need more then 255 characters for example. So by design you choose wich type to use, if you always use text you're wasting resources

BvdVen
  • 2,921
  • 23
  • 33
  • That's a blanket statement that isn't quite accurate. Each choice wastes resources, because variable length fields make searching slower, which results in wasted CPU resources. – davin Jan 16 '11 at 13:10
0

Check this article on PostgresOnline, it also links to two other usefull articles.

Most problems with TEXT in PostgreSQL occur when you're using tools, applications and drivers that treat TEXT very different from VARCHAR because other databases behave very different with these two datatypes.

Frank Heikens
  • 117,544
  • 24
  • 142
  • 135
-1

Database designers almost always know how many characters a column needs to hold. US delivery addresses need to hold up to 64 characters. (The US Postal Service publishes addressing guidelines that say so.) US ZIP codes are 5 characters long.

A database designer will look at representative sample data from her clients when she's specifying columns. She'll ask herself, questions like "What's the longest product name?" And when the answer is "70 characters", she won't make the column 3000 characters wide.

VARCHAR has a limit of 8k in SQL Server (I think). Most applications don't require nearly that much storage for a single column.

Mike Sherrill 'Cat Recall'
  • 91,602
  • 17
  • 122
  • 185
  • You shouldn't be limiting US zip codes to five characters, they expanded them to ZIP+4 (ten characters) about thirty years ago. – mu is too short Jan 16 '11 at 20:26
  • At the risk of stating the obvious, ZIP codes are still five characters. ZIP+4 codes are five characters plus four more characters. Two columns simplifies the constraints, and it simplifies grouping addresses by ZIP code. Grouping by ZIP code is required for price breaks on bulk mail. – Mike Sherrill 'Cat Recall' Jan 16 '11 at 23:10
  • Only if you expect Canadian addresses to conform to USPS specifications. Professionals know better. ([Canada](http://www.canadapost.ca/tools/pg/standards/PSMach_SL-e.pdf)? Preferably less than 30 characters, but no more than 40 characters per address line for machineable mail.) – Mike Sherrill 'Cat Recall' Jul 22 '13 at 00:08