3

I have a string filed in an SQL database, representing a url. Some url's are short, and some very long. I don't really know waht's the longest URL I might encounter, so to be on the safe side, I'll take a large value, such as 256 or 512.

When I define the maximal string length (using SQLAlchemy for example):

url_field = Column(String(256))

Does this take up space (storage) for each row, even if the actual string is shorter?

I'm assuming this has to do with the implementation details. I'm using postgreSQL, but am interested in sqlite, mysql also.

jww
  • 97,681
  • 90
  • 411
  • 885
eran
  • 14,496
  • 34
  • 98
  • 144
  • the max length appears to be 2048 characters http://stackoverflow.com/questions/417142/what-is-the-maximum-length-of-a-url-in-different-browsers I believe that the record in the DB will take the same space regardless of the DBMS implementations, probably even if the string is null – Totò Jun 02 '14 at 06:29
  • This may be of help: http://stackoverflow.com/questions/20326892/any-downsides-of-using-data-type-text-for-storing-strings – Erwin Brandstetter Jun 02 '14 at 07:37

3 Answers3

11

In PostgreSQL character(n) is basically just varchar with space padding on input/output. It's clumsy and should be avoided. It consumes the same storage as a varchar or text field that's been padded out to the maximum length (see below). char(n) is a historical wart, and should be avoided - at least in PostgreSQL it offers no advantages and has some weird quirks with things like left(...).

varchar(n), varchar and text all consume the same storage - the length of the string you supplied with no padding. It only uses the storage actually required for the characters, irrespective of the length limit. Also, if the string is null, PostgreSQL doesn't store a value for it at all (not even a length header), it just sets the null bit in the record's null bitmap.

Qualified varchar(n) is basically the same as unqualified varchar with a check constraint on length(colname) < n.

Despite what some other comments/answers are saying, char(n), varchar, varchar(n) and text are all TOASTable types. They can all be stored out of line and/or compressed. To control storage use ALTER TABLE ... ALTER COLUMN ... SET STORAGE.

If you don't know the max length you'll need, just use text or unqualified varchar. There's no space penalty.

For more detail see the documentation on character data types, and for some of the innards on how they're stored, see database physical storage in particular TOAST.

Demo:

CREATE TABLE somechars(c10 char(10), vc10 varchar(10), vc varchar, t text);
insert into somechars(c10) values ('  abcdef ');
insert into somechars(vc10) values ('  abcdef ');
insert into somechars(vc) values ('  abcdef ');
insert into somechars(t) values ('  abcdef ');

Output of this query for each col:

SELECT 'c10', pg_column_size(c10), octet_length(c10), length(c10) 
from somechars where c10 is not null;

is:

 ?column? | pg_column_size | octet_length | length 
 c10      |             11 |           10 |      8
 vc10     |             10 |            9 |      9
 vc       |             10 |            9 |      9
 t        |             10 |            9 |      9

pg_column_size is the on-disk size of the datum in the field. octet_length is the uncompressed size without headers. length is the "logical" string length.

So as you can see, the char field is padded. It wastes space and it also gives what should be a very surprising result for length given that the input was 9 chars, not 8. That's because Pg can't tell the difference between leading spaces you put in yourself, and leading spaces it added as padding.

So, don't use char(n).

BTW, if I'm designing a database I never use varchar(n) or char(n). I just use the text type and add appropriate check constraints if there are application requirements for the values. I think that varchar(n) is a bit of a wart in the standard, though I guess it's useful for DBs that have on-disk layouts where the size limit might affect storage.

Craig Ringer
  • 307,061
  • 76
  • 688
  • 778
  • Further on the topic of the "logical length" of a string, do you know if PostgreSQL respects only unicode code points in the basic multilingual plane (0-​FFFF), or also surrogate pairs that make up obscure scripts, symbols, and notably, emoji. Furthermore, what about appearance-modified emoji like skin-color on faces, country flags? I'm working on client software and trying to restrict input length to match maximums in our back end, and haven't yet been able to find where this is documented. – Pierre Houston Feb 05 '16 at 03:12
  • @smallduck PostgreSQL doesn't normalize unicode and will count surrogate pairs, decomposed glyphs, etc, as more than one character. http://www.postgresql.org/message-id/53E179E1.3060404@2ndquadrant.com . Run `select length('á') AS decomposed, length('á') AS precomposed;` (copied and pasted exactly) to see. If your browser, terminal, etc don't normalize the unicode either you'll get lengths 2 and 1 respectively. – Craig Ringer Feb 06 '16 at 12:55
  • 1
    As of [v9.2](https://www.postgresql.org/docs/9.2/release-9-2.html#AEN114949:~:text=Increasing%20the%20length%20limit%20for%20a,no%20longer%20requires%20a%20table%20rewrite) I prefer `varchar(n)` over `text` because it allows me to increase the length without rechecking. Changing a check constraint forces a recheck, regardless of whether you're increasing the char length. That check uses an `AccessExclusiveLock`, preventing reads, making it non-ideal for production dbs. – DharmaTurtle Aug 16 '20 at 15:29
1

Both postgreSQL, sqllite and mysql applies the sql standard for storing varchar and chars. Which is basiclly this:

SQL defines two primary character types: character varying(n) and character(n), where n is a positive integer. Both of these types can store strings up to n characters in length. An attempt to store a longer string into a column of these types will result in an error, unless the excess characters are all spaces, in which case the string will be truncated to the maximum length. (This somewhat bizarre exception is required by the SQL standard.) If the string to be stored is shorter than the declared length, values of type character will be space-padded; values of type character varying will simply store the shorter string.

If one explicitly casts a value to character varying(n) or character(n), then an over-length value will be truncated to n characters without raising an error. (This too is required by the SQL standard.)

The notations varchar(n) and char(n) are aliases for character varying(n) and character(n), respectively. character without length specifier is equivalent to character(1). If character varying is used without length specifier, the type accepts strings of any size. The latter is a PostgreSQL extension.

Reference:

Craig Ringer
  • 307,061
  • 76
  • 688
  • 778
Arion
  • 31,011
  • 10
  • 70
  • 88
0

Usually database storage engines can do many thing you don't expect. But basically, there are two kinds of text fields, that give a hint what will go on internally.

char and varchar. Char will give you a fixed field column and depending on the options in the sql session, you may receive space filled strings or not. Varchar is for text fields up to a certain maximum length.

Varchar fields can be stored as a pointer outside the block, so that the block keeps a predictable size on queries - but that is an implementation detail and may vary from db to db.

thst
  • 4,592
  • 1
  • 26
  • 40
  • Why would I ever want to specify the length, and not just use the varchar then? What would be the performance advantage of using char(n) and not varchar(n) – eran Jun 02 '14 at 06:40
  • Varchar still requires the DB to manage space correctly to evenly sized memory blocks or pages. The db may decide that the dataset is placed at the end of the block and use an "in block" pointer. So the columns take up predictable space in the block and the variable data is placed behind, on large varchars, the db may require a second page to be read for the data of the varchar. The performance advantage of char is usually, that you don't need to fetch the varchar data from the page, it is just there. But as I said: This may vary with implementation and is subject to the optimizer of the db. – thst Jun 02 '14 at 06:49
  • 2
    `char(n)` is a horrible relic from the days of fixed-width formats (think `FORTRAN 77`) and should IMO simply be avoided. In PostgreSQL it has no advantages over `varchar` and plenty of disadvantages. Both `varchar` and `char` are TOASTable and subject to out-of-line storage, so your distinction there is incorrect. Anyway, if you want to make any field stored inline, you can do it with `ALTER TABLE ... ALTER COLUMN ... SET STORAGE`, irrespective of data type. See the documentation for character types and for TOAST. – Craig Ringer Jun 02 '14 at 07:36