5

I have gone through the official postgres documentation to know about the LC_COLLATE and LC_TYPE. But, still I don't understand it correctly.

Can anyone help me in understanding these concepts and impact of these, especially when we are trying to load data which is at oracle of encoding WE8ISO8859P15 and at postgres encoding is as utf-8 and collation/ctype is en_US.UTF-8.

Thanks in advance

1 Answers1

6

This is part of the “locale”, the national language support, which is different from the encoding (but the locale has to belong to the encoding).

LC_CTYPE determines which characters are letters, numbers, space characters, punctuation etc. Different languages have different ideas about that.

LC_COLLATE determines how strings are compared and sorted.

The first has little impact on the behavior of PostgreSQL, but the second is very relevant: it determines how b-tree indexes on string columns are ordered (which is why it cannot be changed after a database has been created) and how ORDER BY sorts strings by default (which is directly user-visible).

Laurenz Albe
  • 209,280
  • 17
  • 206
  • 263
  • Is that mean, as I'm using latin1 encoding at oracle and at postgres even though i'm using encoding as utf-8, but the lc_ctype is set en_US.UTF8 , due to which I was ending up with some variation in data at postgres??? @Laurenz Albe – vigneshwar reddy Jul 21 '21 at 13:29
  • 1
    Unicode took Latin1 as basis for the first 256 characters, so you should have no problems (OTOH some programs may use reserved and control character of latin1 for different purposes). Note: I assume you are trans-coding things. Latin1 is not binary compatible with UTF-8. But as the answer: `ORDER` is often important, so take care to find if this would badly affect your program. – Giacomo Catenazzi Jul 21 '21 at 13:35
  • @vigneshwarreddy No, this has nothing to do with how characters are encoded. It is quite unrelated. – Laurenz Albe Jul 21 '21 at 13:41