I have come across a problem that is beyond my Googling abilities, and I would appreciate any help at all.
I've created a very simple SSIS package to copy data from a source table in an Azure SQL database to a target in an MS SQL Database using an OLEDB connection. The target table was created from scripts generated from the source database (in SSMS, I right-clicked on database name -> Tasks -> Generate Scripts), so they should be identical. The SSIS package does not do any transformations--it's a simple truncation of the Target table, then a dataflow task with an OleDBSource connected to an OleDBDestination.
However, I'm getting the following error on package execution:
Hresult: 0x80004005 Description: "Violation of PRIMARY KEY constraint 'PK_HashKey'. Cannot insert
duplicate key in object 'TargetTable'. The duplicate key value is ( ).".
The primary key column in question has a datatype of char(32) and is generated using the HASHBYTES function using the 'SHA2_256' algorithm. The error message gives an example of a bad primary key as basically a series of whitespaces. When I look in the Source table, it does look like there are multiple rows where the primary key is just a series of whitespaces. But I guess Azure SQL can distinguish between them somehow, because there are no primary key issues on the Source table. It's only when I'm trying to copy the data to an on-prem MS SQL Server database that I'm getting a primary key violation.
Things I've tried:
Checked that the collation settings are the same on both the Source and Target table columns, database, and server.
Checked the advanced editor for the SSIS dataflow Source and Destination to make sure the codepage is the same.
Removed the primary key constraint on the Target table, then moved the data using SSIS, and then ran a query to compare the hashes transferred by SSIS vs. completely generating new hashes. For about 8% of the table, SQL Server did not think the transferred hash matched the new hash, despite the hashes generally looking alike. Here is the query:
SELECT CONVERT(CHAR(32), HASHBYTES('SHA2_256', BusinessKey)), Hashkey, BusinessKey
FROM TargetTable
WHERE CONVERT(CHAR(32), HASHBYTES('SHA2_256', BusinessKey)) <> HashKey
Based on #3, my guess is that one of the following is happening:
SSIS is transforming the hash somehow when it copies data from Source to Target.
The Target column/table/database has some setting that makes it store char(32) data differently than the Source column/table/database.
Some sort of bug.
Anyone have any experiences that could help shed light on this issue?