4

I am trying to replicate the PostgreSQL sorting results for testing purposes.

Suppose you have a table temp with a single column name containing the following lines:

dolores et
dolor nemo
abc
zxc

Now run the following query:

SELECT * from temp ORDER BY name ASC

The sorted result is:

abc
dolores et
dolor nemo
zxc

However, the default implementation for sorting in programming languages will return it differently:

abc
dolor nemo
dolores et
zxc

In JavaScript:

["dolores et", "dolor nemo", 'abc' , 'zxc'].sort()
// returns ["abc", "dolor nemo", "dolores et", "zxc"]

I was wondering if there is an easy way to replicate the sorting result from the database in JavaScript (I am using NodeJS), so I can verify it in my tests. I believe I am missing something very simple here. Any help is appreciated.

I could try to use a function like this one to force the order in PostgreSQL, but I do not want that. My goal is to replicate the default behavior in JavaScript.

Laurenz Albe
  • 209,280
  • 17
  • 206
  • 263
saulotoledo
  • 1,737
  • 3
  • 19
  • 36

1 Answers1

3

Sorting strings is always done using a certain collation. If you are not conscious of using a collation in your programming language, you are probably using the POSIX collation, which compares strings character by character according to their code point (the numeric value in the encoding).

In PostgreSQL, that would look like this:

ORDER BY name COLLATE "POSIX";

So to solve your problem, you'd have to find out the collation of the column.

If there is no special collation specified in the column definition, it will use the database's collation, which can be found with

SELECT datcollate FROM pg_database WHERE datname = 'my_database';

That will be an operating system collation from the C library.

So all you have to do is to use that collation in your program.

If your program is written in C, you can directly use the C library. Otherwise, refer to the documentation of your programming language.

Laurenz Albe
  • 209,280
  • 17
  • 206
  • 263
  • 1
    Thanks for the explanation. A minor side question: I was using the same collation, but I managed to make it work by ignoring the punctuation in my comparison (e.g. `["dolores et", "dolor nemo", 'abc' , 'zxc'].sort((a,b) => a.localeCompare(b, 'en-us', {ignorePunctuation: true}))`). Is it possible to make PostgreSQL do not ignore the punctuation? – saulotoledo Jun 23 '20 at 11:18
  • 1
    From PostgreSQL v10 on, PostgreSQL can be built ti use the ICU library, a library from the UNICODE consortium that provides operating system independent collations. Then you can use `CREATE COLLATION` to create your own PostgreSQL collation, and I am sure you can build one that does that. But did you try using the `POSIX` collation in PostgreSQL? Perhaps that is exactly what you want (any you don't have to mess with ICU collations). – Laurenz Albe Jun 23 '20 at 12:27
  • I will check that. For now I am creating some tests for an existing database managed by an ORM, I just had to be sure I had the same order in the database after a sort and in the application (because I have some layers on top of the database). But I will definitely try the POSIX. Thanks for all your help! :D – saulotoledo Jun 23 '20 at 12:50