I'm currently playing with different ways of getting data into SQL and yesterday hit a problem using BCP which although I solved reminded me of working with SSIS packages because of the not very useful error information. I feel that for the way I like to work I would be much happier loading entire datarows whether fixed-width or delimited into a staging table (using BCP or Bulk Insert) and then operating on the data rows rather than trying to force them into typed columns on the way in to SQL.
As such I would like to find an approach that would allow me to split and validate (check datatype) the data before I insert it into its destination and also write out any bad datarows to another table so I can then decide what to do with them.
I've knocked together a script to simulate the scenario, the importedData table would be the output of my BCP or BULK INSERT. All the data from ImportedData needs to end up either in the Presenters or the RejectedData tables.
I need an approach that could scale reasonably well, a real life situation might me more like 40 columns across with 20 million rows of data so I'm thinking I'll have to do something like process 10,000 rows at a time.
SQL Server 2012 has the new try_parse function which would probably help but I need to be able to do this on 2005 and 2008 machines.
IF OBJECT_ID (N'ImportedData', N'U') IS NOT NULL DROP TABLE dbo.ImportedData
CREATE TABLE dbo.ImportedData (RowID INT IDENTITY(1,1), DataRow VARCHAR(MAX))
IF OBJECT_ID (N'Presenters', N'U') IS NOT NULL DROP TABLE dbo.Presenters
CREATE TABLE dbo.Presenters (PresenterID INT, FirstName VARCHAR(10), LastName VARCHAR(10))
IF OBJECT_ID (N'RejectedData', N'U') IS NOT NULL DROP TABLE dbo.RejectedData
CREATE TABLE dbo.RejectedData (DataRow VARCHAR(MAX))
-- insert as fixed-width
INSERT INTO dbo.ImportedData(DataRow)
SELECT '1 Bruce Forsythe '
UNION ALL SELECT '2 David Dickinson '
UNION ALL SELECT 'X BAD DATA'
UNION ALL SELECT '3 Keith Chegwin '
-- insert as CSV
/*INSERT INTO dbo.ImportedData(DataRow)
SELECT '1,Bruce,Forsythe'
UNION ALL SELECT '2,David,Dickinson'
UNION ALL SELECT 'X,BAD,DATA'
UNION ALL SELECT '3,Keith,Chegwin'
*/
---------- DATA PROCESSING -------------------------------
SELECT
SUBSTRING(DataRow,1,3) AS ID,
SUBSTRING(DataRow,4,10) AS FirstName,
SUBSTRING(DataRow,14,10) AS LastName
FROM
ImportedData
---------- DATA PROCESSING -------------------------------
SELECT * FROM ImportedData
SELECT * FROM Presenters
SELECT * FROM RejectedData