7

I have a requirement to insert a large 2 GB CSV file into my MS SQL database. most of the rows in this is not required to insert. I did not find anything to filter rows while doing bulk insert.I am using MS SQL bulk insert command to do this. Is there any option to filter rows on mySQL/MSSQL/Oracle on bulk insert?

BULK INSERT payroll.t_allowance
   FROM 'f:\orders\lineitem.csv'
   WITH 
      (
         FIELDTERMINATOR =' |',
         ROWTERMINATOR =' |\n'
      );
Mark Chesney
  • 1,082
  • 12
  • 20
Robi
  • 83
  • 1
  • 7
  • @suchit Your edit failed. Sure it made the code pretty, but took english on top and muffed it up bad. "t did not find anything to fl am using..." Huh? 'Builter', what is that? – zipzit Oct 31 '15 at 03:44
  • 1
    How about you bulk insert and then delete the rows you don't need – BICube Oct 31 '15 at 04:05
  • Would it be possible to filter the rows before importing? Either with a scripting language or maybe with type + find? – James Z Oct 31 '15 at 05:56

3 Answers3

3

You can use OPENROWSET with BULK option:

SELECT *
FROM OPENROWSET(BULK 'f:\orders\lineitem.csv', 
                FORMATFILE= 'f:\orders\format.xml') AS a
WHERE ... 

format.xml is a file where you configure delimeters, column names, terminators etc: https://msdn.microsoft.com/en-us/library/ms178129.aspx

Giorgi Nakeuri
  • 35,155
  • 8
  • 47
  • 75
3

To avoid the complexities of the format file, and with some other trade-offs, you can create a staging table, BULK INSERT the staging table, and use INSERT...SELECT to load the target table from the staging table.

-- Create a temporary staging table with the same column names and data types, but no indexes
-- Alternatively, use CREATE TABLE
-- When using a permanent table, use TRUNCATE TABLE
SELECT *
INTO #stage
FROM payroll.t_allowance
WHERE 1 = 0;

-- Bulk load the staging table
-- Use the TABLOCK hint to achieve minimally logged inserts
BULK INSERT #stage
FROM 'f:\orders\lineitem.csv'
WITH (TABLOCK, FIELDTERMINATOR = ' |', ROWTERMINATOR = ' |\n');

-- Load the target table from the staging table
INSERT INTO payroll.t_allowance
SELECT s.*
FROM #stage AS s
WHERE...;

-- Drop the staging table
-- or use TRUNCATE TABLE for a permanent table
DROP TABLE #stage;
Mark Chesney
  • 1,082
  • 12
  • 20
0

Using OPENROWSET(BULK...) is the way to go, assuming you can't pre-filter the file.

INSERT INTO table1 (col1, col2, ..., colN)
SELECT col1, col2, ..., colN
FROM OPENROWSET(BULK '<data_file_name>.txt', FORMATFILE = '<format_file_name>.xml') AS a
WHERE ...

As a side benefit, you can optionally transform the data from the file using functions and lookup tables and add joins to other tables for filtering, but you'll need to be mindful of the performance implications.

INSERT INTO table1 (col1, col2, ..., colN)
SELECT a.col1, a.col2, ..., a.colN
FROM OPENROWSET(BULK '<data_file_name>.txt', FORMATFILE = '<format_file_name>.xml') AS a
INNER JOIN ...
ON ...
LEFT JOIN ...
ON ...
WHERE ...

See Import Bulk Data by Using BULK INSERT or OPENROWSET(BULK...)

Note that using OPENROWSET(BULK...) will require you to create a format file in an XML or non-XML format. Both file formats can be challenging to write by hand and easy to get wrong. Your work will be easier if you can start from a generated file and make the necessary modifications. Generate the format file using the bcp program. Prefer the XML format (-x argument).

bcp <table_or_view> format nul -f<format_file_name>.xml -x

See Create a Format File

Mark Chesney
  • 1,082
  • 12
  • 20