1

I have a text file with 6 columns and 200 million rows and none of them is unique. I'd like to import them into a table in SQL Server and want to define an Identity column as primary key.

Therefore I created the following table first:

CREATE TABLE dbo.Inventory 
(
    ProductID NUMERIC(18,3) NOT NULL,
    RegionID NUMERIC(18,3) NULL,
    ShopCode INT NULL,
    QTY FLOAT NULL,
    OLAPDate VARCHAR(6) NULL,
    R Float NULL,

    ID BIGINT NOT NULL PRIMARY KEY IDENTITY(1,1)
)

Then I use the below command for importing the text file into the table:

bcp ETLDB.dbo.Inventory in D:\SCM\R.txt -T -b 10000 -t "," -c -e D:\SCM\Errors.txt

and I got these errors:

errors

I am not sure if the errors are because of the identity id column which is in my table design and not in my original text file or not. Because when I delete the identity id key from the table, the bcp works fine. But I want the bcp defines the identity id in the process of importing my file into table.

The sample text file:

text file

Any help would be appreciated.

marc_s
  • 732,580
  • 175
  • 1,330
  • 1,459
mpy
  • 622
  • 9
  • 23

3 Answers3

1

There is a workaround I tried for a similar case.

Step 1: Create a Table with the columns available to your CSV/TXT file.

Step 2: Push the data using the BCP script.

bcp dbo.<tablename> in <file location in local folder> -S <server_name> -d <database_name> -U <username> -P <password> -b 20000 -q -c -t"<column delimiter>"

Step 3: Once the data is available on your destination table you can now alter the table with the below SQL command:

ALTER TABLE <Table Name>
ADD <Identity Column> BIGINT IDENTITY(1,1)

Adding Few SQL Statement to help you understand Update-Insert Script for Incremental Load.

CREATE TABLE Employees
(
    ID INT IDENTITY(1,1),
    Name VARCHAR(100),
    Salary INT,
    InsertDate DATETIME,
    UpdateDate DATETIME
)

INSERT INTO Employees
VALUES
('Kristeen',1420,NULL,NULL)
,('Ashley',2006,NULL,NULL)
,('Julia',2210,NULL,NULL)
,('Maria',3000,NULL,NULL)

CREATE PROCEDURE dbo.InsertOrUpdateEmployee
    @Name VARCHAR(100),
    @Salary INT
AS BEGIN

    CREATE TABLE #tmpData
    (
        Name VARCHAR(50),
        Salary INT
    )   

    INSERT INTO #tmpData(Name,Salary)

    VALUES(
        @Name,
        @Salary 
    )


    UPDATE A
    SET A.Name = B.Name,
        A.Salary = B.Salary,
        A.updatedate = GETDATE(),
        A.IsNewRecord = 0
    FROM Employees A
    JOIN #tmpData B
    ON A.Name = B.Name
    AND A.Salary = B.Salary

    INSERT INTO Employees
    (
        Name,
        Salary,
        InsertDate,
        IsNewRecord
    )
    SELECT 
        S.Name,
        S.Salary,
        GETDATE(),
        1
    FROM #tmpData S
    LEFT JOIN Employees D
    ON S.Name = D.Name
    AND S.Salary = D.Salary
    WHERE D.Name IS NULL
    AND D.Salary IS NULL

    DROP TABLE #tmpData

END

EXEC InsertOrUpdateEmployee 'Gaurav',4500000

You need to modify a bit with the code above as the above code is to insert the data through SP parameter, but in your case, you might need to use the Source Table in place of a temporary table and in the end you can truncate the source table after moving the complete data into the Destination table.

shishir
  • 851
  • 3
  • 11
  • 27
  • @vignesh kumar A also mentioned that, but I got this file on a monthly basis and that is a one time solution. – mpy Nov 07 '19 at 06:24
  • You need to apply the incremental load logic here. In Which you have to work with multiple tables table_bcp and table_ff(Flat File) – shishir Nov 07 '19 at 07:47
  • could you elaborate more on the incremental load logic please? – mpy Nov 07 '19 at 08:01
  • 1
    Hey, Here you need to create one Source and Destination table. The monthly file will data will be loaded into Source File(this table will be same as the file structure) now you need to write either Merge Script or upsert(Update Insert) Script to load the data from Source table to Destination Table. – shishir Nov 08 '19 at 04:53
1

Create a view that looks like what you want to load into and load into that

CREATE VIEW dbo.Inventory_Stage
AS SELECT 
    ProductID,
    RegionID,
    ShopCode,
    QTY,
    OLAPDate,
    R Float
FROM Inventory

Now load into Inventory_Stage instead of Inventory

also, use -F to start loading at the second row, because the first row has column names

bcp ETLDB.dbo.Inventory_Stage in -F 1 D:\SCM\R.txt -T -b 10000 -t "," -c -e D:\SCM\Errors.txt

Also, seriously consider if you want to use float. For your sample data I recommend NUMERIC(19,6)

Nick.Mc
  • 18,304
  • 6
  • 61
  • 91
  • You cannot load data into a view. Views do not have data pages. – jamie Nov 07 '19 at 19:41
  • You are able to [insert data](https://learn.microsoft.com/en-us/sql/relational-databases/views/modify-data-through-a-view?view=sql-server-2016) into views. – Sean Brookins Nov 07 '19 at 19:50
  • It’s actually very useful for making your table look an input file (removing columns) so that you don’t need to fiddle with BCP format files. It’s the way I always get around having an identity in the target. – Nick.Mc Nov 07 '19 at 19:56
0

The issue is that you are trying not passing last column, which is an INT column.

"-E Specifies that identity value or values in the imported data file are to be used for the identity column. If -E is not given, the identity values for this column in the data file being imported are ignored."

You have three options...

  1. Add an INT column to the source data as the first row and have it incremented like an IDENTITY would be incremented and continue to pass the -E option. Doing this will allow the data from the source to be used as the IDENTITY column.

  2. Add a random INT to the last column of your source data, say 1 for every row, then do not pass in the -E. According to the documentation, when -E is not provided it will ignore the values for the identity column and start at the currently seeded value and auto-increment.

  3. Leverage a format file to specify which columns from your data file go into which columns in our SQL table.

How to specify the format file

How to construct a format file

Updated Answer

When you don't have option to modify the source data, then please remove the identity columns and perform as below: - Remove Identity Column from the table - Do your Import - After successful of import, please add the identity column as below:

Alter Table Names
Add Id_new BigInt Identity(1, 1)
Go

As Marc_s mentioned here

Don't BULK INSERT into your real tables directly.

I would always

  1. insert into a staging table dbo.Employee_Staging (without the IDENTITY column) from the CSV file
  2. possibly edit / clean up / manipulate your imported data
  3. and then copy the data across to the real table with a T-SQL statement like:

    INSERT INTO dbo.Employee(Name, Address) 
       SELECT Name, Address
       FROM dbo.Employee_Staging
    
Vignesh Kumar A
  • 27,863
  • 13
  • 63
  • 115
  • The problem is that I do not access to the source file and editing a text file with millions of rows is impossible. – mpy Nov 07 '19 at 06:04
  • @mpy Then, please drop the identity column and do your import. After the successful import, please do create the identity column – Vignesh Kumar A Nov 07 '19 at 06:10
  • But I get this file each month and I want to add them to my table. If I import them to my table without identity and add an identity column afterwards- which takes hours- the next month, I have to drop table, import all file again and start defining the identity column which make no sense. – mpy Nov 07 '19 at 06:17
  • @mpy As per step 1, you should give Add an INT column to the source data as the first row and have it incremented like an IDENTITY would be incremented and continue to pass the -E option. Doing this will allow the data from the source to be used as the IDENTITY column. – Vignesh Kumar A Nov 07 '19 at 07:11