1

I have a table with a Unique Key on columns Pfx, Bse and Sfx. While inserting data, it seems to me like SQL Server is internally applying an RTRIM and causing an issue with my Sfx column which has a space in the second row. Is it possible to prevent this RTRIM or am I missing something?

INSERT INTO Part (Seq, Pfx, Bse, Sfx, Stat, Desc, Cr_date, Cr_User)
SELECT 1 SEQ, '2R83' AS PFX, '6477' BSE, 'AA' SFX, 1 STAT, 'SPLIT MASS FLYWHEEL' DESCR, GETDATE() CR_DT, 'USERID' CR_US 
UNION ALL
SELECT 2, '2R83', '6477', 'AA ', 1, 'SPLIT MASS FLYWHEEL', GETDATE(), 'USERID';

The error message doesn't seem to be trimming the data and retains the space.

Violation of UNIQUE KEY constraint 'NNMP0672'. The duplicate key value is (2R83, 6477, AA )

rsreji
  • 170
  • 1
  • 1
  • 17
  • 1
    You really need to be able to store 'AA' and 'AA ' as different values? This doesn't seem like a good idea to me. Even if you can convince the database to allow it. It's likely to cause very frustrating maintenance issues in the future. – user1751825 Mar 04 '18 at 14:12
  • Good question. I’m migrating data from a legacy oracle DB to sql server. There are other columns like description that making me think that this data should be loaded. Hope that makes sense. – rsreji Mar 04 '18 at 14:15
  • @rsreji, SQL Server ignores trailing spaces when comparing string values so you will get a duplicate key violation with or without the trailing spaces stored. – Dan Guzman Mar 04 '18 at 15:02
  • @rsreji If I was faced with this problem, I would be very suspicious that the original record, with the trailing space, had been entered incorrectly. You could attempt to load it exactly as originally stored, but what it probably really needs is for the data to be duplicate resolved. – user1751825 Mar 05 '18 at 05:21
  • Yeah, I'm going to try to see if I can work to tackle the data. Thanks. – rsreji Mar 05 '18 at 16:19
  • @rsreji right padding is one of the more unintuitive characteristics of SQL. ANSI SQL 92 requires that strings are padded to the same length before comparing. Even if trailing spaces are preserved, abc and abc are still considered equal. Both SQL Server **and** Oracle respect this. It seems the legacy database was using some deprecated features – Panagiotis Kanavos Mar 09 '18 at 16:46
  • It was a part of the PK in Oracle. I'm moving forward with a data cleanup, thank you. – rsreji Mar 09 '18 at 17:49

5 Answers5

3

Are the columns CHAR or VARCHAR?

This likely has to do with what the ANSI_PADDING setting was at the time that the column was created. If ANSI_PADDING is set to OFF then VARCHAR columns are automatically trimmed when inserted into the column. CHAR can be a little more tricky when it is defined to allow NULL values, but in general it always pads columns to be of the maximum length for the column. So, in short, you'll probably want VARCHAR columns with ANSI_PADDING set ON.

Keep in mind that the ANSI setting applies when the column is created, so you will have to drop and recreate the table or at least the column to accomplish this.

As others have said though, it's generally a pretty bad idea to rely on hidden or whitespace characters to differentiate between keys in a table. The fact that your import is failing here might mean something else besides the fact that there is a difference in trailing spaces - maybe this is bad data in the source system that should be corrected when you import it so that you never have the issue in the first place. Treat the problem, not the symptom ;)

Also, this may sound like a personal preference, but since we're no longer in the days when column names were limited to 8 characters, you might want to be a bit more descriptive with your column names rather than Pfx, Bse, etc. Spell words out and be descriptive. I've found that this makes development and debugging much easier. I realize that you're converting a legacy system, so maybe it's difficult (or not possible at this time) to do that, but if you can then I would highly suggest it.

Here's a link to the documentation on ANSI_PADDING if you want more information: https://learn.microsoft.com/en-us/sql/t-sql/statements/set-ansi-padding-transact-sql

Tom H
  • 46,766
  • 14
  • 87
  • 128
  • 1
    Varchar data type. Thanks for the info on padding settings, will check on this with our data modeler. I did spend time to narrow down to this issue. Regarding the column names, I shortened them before submitting the question. – rsreji Mar 04 '18 at 16:04
  • @rsreji right padding is one of the more *un*intuitive characteristics of SQL. ANSI SQL 92 *requires* that strings are padded to the same length before comparing. Even if trailing spaces are preserved, `abc` and `abc ` are still considered equal – Panagiotis Kanavos Mar 09 '18 at 16:45
1

In this link:

https://support.microsoft.com/en-gb/help/316626/inf-how-sql-server-compares-strings-with-trailing-spaces

It says that in order to compare two strings of different lengths, the shorter string is padded with spaces, therefore your 'AA' in the first row becomes 'AA ' for comparison.

Example:

create table dbo.Strings (
    ID INT IDENTITY(1,1) PRIMARY KEY CLUSTERED,
    S_VC VARCHAR(100) NULL
)

insert strings (S_VC)
values  ('Robert '),
        ('Robert')

select  ID, S_VC, datalength(S_VC) Data_Len, len(S_VC) [Len]
from    strings

select  *
from    strings s1 inner join strings s2
        on s1.S_VC = s2.S_VC
MJH
  • 1,710
  • 1
  • 9
  • 19
1

I really do not recommend what I am about to propose. But, you can accomplish what you want by using an explicit unique index and a computed column.

Note that spaces at the end of string are generally ignored. This is considered a good thing because we don't see them. WYSIWYG (what-you-see-is-what-you-get) is often a reasonable approach. Spaces at the end of strings are ignored for LEN() as well as for comparisons, for instance.

But, you can still calculate the length by appending a character and subtracting. So the following will allow you to have spaces at the end count as separate distinct values:

alter table t add s_len as (len(s + 'x') - 1);

create unique index t_s_slen on t(s, s_len);

Here is a SQL Fiddle that illustrates this in action. Of course, you need to remove the unique constraint on the column alone.

Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786
0

What is your table definition? (ie what data types)

It might suit you better to use NVARCHAR data types

See here, as it explains why VARCHAR types use the ANSI standard and ignore whitespace on the end of these data types

Grantly
  • 2,546
  • 2
  • 21
  • 31
  • Data type = varchar. Wouldn’t nvarchar double up my disk usage? This table contains about 10 mil rows. – rsreji Mar 04 '18 at 14:13
  • Not necessarily double, but close to double for the char / varchar datatypes. If you have ints and decimals then these are not affected. Increasing the size of your database is another Question :) – Grantly Mar 04 '18 at 14:18
0

Comparison is based on rtrim but they are different

declare @tV table (name varchar(10) primary key);
insert into @tV values ('bob'), ('alice'), ('ted'), ('al '), (' al');
select *, len(name) as ln, DATALENGTH(name) as dl
from @tV;

    name       ln          dl
---------- ----------- -----------
 al        3           3
al         2           3
alice      5           5
bob        3           3
ted        3           3

You could use this to pad the space with _

set nocount on;
declare @al1 varchar(10) = 'al';
declare @al2 varchar(10) = 'al ';
select @al1, len(@al1), DATALENGTH(@al1), left((rtrim(@al1) + '____'), DATALENGTH(@al1))
     , @al2, len(@al2), DATALENGTH(@al2), left((rtrim(@al2) + '____'), DATALENGTH(@al2));
select 'equal' where  @al1 = @al2;
select 'not equal' where  @al1 <> @al2;
select 'equal' where  @al1 = @al2;
select 'equal' where  left((rtrim(@al1) + '____'), DATALENGTH(@al1)) = left((rtrim(@al2) + '____'), DATALENGTH(@al2));

---------- ----------- ----------- -------------- ---------- ----------- ----------- --------------
al         2           2           al             al         2           3           al_


-----
equal


---------


-----
equal


-----
paparazzo
  • 44,497
  • 23
  • 105
  • 176