0

SQL language, in particular PostgreSQL 9+, have many ways to do the same thing... But in many circumstances (see Notes sec. for a rationale) we need to "cut diversity" and opt to a standard way.

There are a tendency to adopt text data type instead varchar. Will be "the standard way to express strings" in PostgreSQL (!), avoiding lost time in project discussions and casting similar formats... But, how to use text preserving the size limit constraint?

I use CHECK(char_length(field)<N) and have no problem to change the limit in live environment, so it is perhaps the best way... Is it?


Some variations: in general what is the best choice?

  1. in CREATE TABLE:

    1.1. CHECK after the data type, just like default value definitions. This is the best practice?

    1.2. CHECK after all column definitions. Usual to multi-column declaration like CHECK(char_length(col1)<N1 AND char_length(col2)<N2).

    1.2.1. Some people like also to express all individual CHECKs after, to not "pollute" column declarations.

  2. Use in trigger: there are some advantage?

  3. Other ways... Other relevant one?

1.1, 1.2, 2 or 3, what is the best practice ?


CONTEXT NOTES

In projects and teams with some KISS or Convention over configuration demands, we need "good practices" recommendations... I was looking for it, in the context of CREATE TABLE ... text/varchar and project maintenance. No unbiased "good practices" recommendation in the horizon: Stackoverflow votings are the only reasonable record of this kind of recommendation.

Convention scope

(edit) For individual use, of course, as @ConsiderMe commented, "no matter what you choose, as long as you stick with it throughout the entire time there will be no problem with it".

This question, by other hand, is about "SQL community" or "PostgreSQL community" best practices conventions.

Community
  • 1
  • 1
Peter Krauss
  • 13,174
  • 24
  • 167
  • 304

2 Answers2

2

I like to keep the code as short as it's possible so it'd go with length(string) in the CHECK constraint. I do not see a particular use for char_length in this case - it takes up more "code space".

Internally, they are both textlen anyways.

You should be careful about signs that take more than 1 byte. In this case I would use octet_length. As an example consider character ą which returns 1 when asked for length, and 2 when asked for octet_length. It's been a pain doing migrations between database systems with different length enforcement.


I believe that a good source for "best practices" would be to follow documentation.

It says that using CHECK constraint inline with a column implies a column constraint which is bound to a particular column.

It mentions table constraint which is written separately from any column definition and enforces data corectness between several columns.

Basically in projects I'm involved I follow this rule for readability and maintenance purposes.

I wouldn't even consider creating trigger for such things. To me they are designed for much more complex tasks. I don't see a reason to enforce simple data correctness rules in triggers.

I can't think of any other solution which would be as basic as the standard ones and still doing it's simple job as those mentioned above.

Kamil Gosciminski
  • 16,547
  • 8
  • 49
  • 72
  • As I say "have many ways to do the same thing"... The SQL-scale problem repeats as function-scale problem: `length(x)` = `char_length(x)` = `length(x,'utf8')` (you can test for x=`josé ą`). What is the "canonical" name? The [Guide](http://www.postgresql.org/docs/current/static/functions-string.html) not say what is an alias and what is the original function. About use `octet_length`: the best practice is to avoid use of this function (!), because today (2016) the universal encode is UTF8, and the default semantic is "count the UTF8 characters". This is also a rationale for use `char_length`. – Peter Krauss Apr 22 '16 at 21:12
  • The PG Guide as "source for best practices", I agree for many contexts, but in the context of this quetion, it is being silent. It is also not unbiased: 1) some [Conflict of interest](https://en.wikipedia.org/wiki/Conflict_of_interest), because to "show diversity" is something like a "branding strategy"; 2) it is "not so open" to the community, like ex. Wikipedia or a "please comment here" page. There are nothig as a "[stakeholder](https://en.wikipedia.org/wiki/Stakeholder_(corporate)) body" in pg community... so I say that SO is the "reasonable record of this kind of recommendation" – Peter Krauss Apr 22 '16 at 21:27
  • Ops, sorry the long "meta discussion", the main comment is here: "1.1, 1.2, 2 or 3, what is the best practice?", I'm understanding that your choice is **1.1**, is it? 1.1 with some considerations. – Peter Krauss Apr 22 '16 at 21:33
  • @PeterKrauss yes, it is 1.1. But no matter what you choose, as long as you stick with it throughout the entire time there will be no problem with it – Kamil Gosciminski Apr 23 '16 at 08:47
1

The Depesz article on which this reasoning was based is outdated. The only argument against varchar(N) was that changing N required a table rewrite, and as of Postgres 9.2, this is no longer the case.

This gives varchar(N) a clear advantage, as increasing N is basically instantaneous, while changing the CHECK constraint on a text field will involve re-checking the entire table.

Nick Barnes
  • 19,816
  • 3
  • 51
  • 63