0

I am trying to track down the difference between the way select is working on two different installations of Postgresql. One is 9.3 on Windows with SQL_ASCII encoding. The other is 9.3 on Linux with SQL_ASCII encoding.

The issue revolves around how select is handling character fields with leading spaces. The following commands are done in both regions with 2 different sets of results:

Windows

create table jb (jb1 character(3));
insert into jb values('010');
insert into jb values('  1');
insert into jb values('999');
select * from jb where jb1 between '  1' and '999';

Results: 010,1,999

Linux

create table jb (jb1 character(3));
insert into jb values('010');
insert into jb values('  1');
insert into jb values('999');
select * from jb where jb1 between '  1' and '999';

Results: 1,99

My best explanation is that the Linux installation is inherently removing all leading spaces within the SQL query... I do not, however, understand why this would be and how to overcome it. There are millions of legacy rows involved with foreign keys against dozens of other tables.

Input is appreciated.

J Bove
  • 1
  • 1
  • As a note, Windows would appear to be correct in this case. – Gordon Linoff Mar 27 '17 at 14:36
  • It is probably a string collation problem, see [this answer](http://stackoverflow.com/a/37837908/6464308). – Laurenz Albe Mar 27 '17 at 14:38
  • 1
    Character collation defaults to the host environment. https://www.postgresql.org/docs/current/static/multibyte.html If you are using between can you not cast these values as 'int' or do they have to be strings ? – VynlJunkie Mar 27 '17 at 14:38
  • See also [Different behaviour in “order by” clause: Oracle vs. PostgreSQL](http://stackoverflow.com/questions/31528037/different-behaviour-in-order-by-clause-oracle-vs-postgresql/31528216#31528216). – klin Mar 27 '17 at 14:39
  • Mokadilion, they do have to be strings. They are other cases where the same columns could contain ABC, DEF, etc. I'll take a look at the character collation though... thanks – J Bove Mar 28 '17 at 14:50

1 Answers1

0

Thanks to Laurenz & Mokadillion.

The issue here is specifically the lc_collate setting. On the windows box, lc_collate is set to English_United States.1252 and on the Linux box: en_us.UTF8

So, running the query on linux and explicitly setting the collation to "C" will resolve the problem.

select * from jb where jb1 collate "C" > '  1';

This however, is not a "good" solution as the number of queries to correct is quite large. Instead, there are only two solutions.

The first solution is to change the collation on column jb1 in table jb.

alter table jb alter jb1 type character (3) collate "C";

As the actual column jb1 exists in dozens of tables and other columns have similar issues, this will not likely be the best long-term solution. Instead, the ultimately solution is to dump the database; drop the database; recreate the database under the appropriate lc_collate; import the database.

Given that the actual database is > 50 Gig, this is unfortunate. However, it is the only reliable answer I have found.

J Bove
  • 1
  • 1