1

I have a varchar field that has a key that is made up of a string that sometimes has some pattern to it, which starts with a name & timestamp.

I'd like to support deleting old records that have a given prefix, and I want run a query like DELETE FROM table WHERE key < %s, but I want to know exactly what I'm getting myself into. I can't find any official docs about how this should work. Do they exist? I found Practical Postgres book that mentions it's based on alphabetical order, but I'm not sure if that is still true in any official docs. Is there anything up-to-date I can refer to other than my poking around?

Charles L.
  • 5,795
  • 10
  • 40
  • 60
  • 2
    This site is not for asking references to external resources. Instead, rephrase your question to give some examples of the data that you have, want you want to delete and what you want to keep. – Patrick Oct 02 '15 at 01:13
  • 1
    I think it's a good question and the manual does indeed lack information on this. But I agree that it is off-topic. You should post this on the Postgres mailing list: http://www.postgresql.org/list/ In a nutshell: the comparison is depending on the collation used : http://www.postgresql.org/docs/9.4/static/collation.html –  Oct 02 '15 at 06:34

1 Answers1

2

Strings are sorted based on a lexical comparison as performed by the libc strcmp function, using the collation specified by, in descending order of preference:

  • An explicit COLLATE on the operator or ORDER BY
  • Any COLLATE specified on the column
  • The database's default LC_COLLATE

(The above is from memory, check the manual to be 100% sure).

Thus ordering varies based on your locale. In some locales, for example, punctation and/or whitespace is ignored, and sorting by case varies too.

You can use COLLATE "C" to override the default sort order for the DB and sort by the simple byte ordering of the strings.

All string sorts - with locale collations or not - are "dumb" in that they don't recognise numbers, dates, etc. The sort order is not what a human would usually choose. For example:

SELECT i
FROM generate_series(0,22) i
ORDER BY i::text COLLATE "C";

will emit

1
11
2
...

Similarly:

SELECT d
FROM (
  VALUES
  ('2001'),
  ('2000 A.D.'),
  ('600 B.C.')
) x(d)
ORDER BY d;

produces

 2000 A.D.
 2001
 600 B.C.

so if you want a natural, or humanized, sort, you'll need to get more sophisticated - split the fields up and sort each part. See:

Community
  • 1
  • 1
Craig Ringer
  • 307,061
  • 76
  • 688
  • 778