1

I frequently import a 2GB csv file with 24 million rows to SQL Server. I import this as text and then carry out the conversion via SELECT xxx INTO.

Will the conversion use less memory be used if I split this into separate queries on different sections of the data?

marc_s
  • 732,580
  • 175
  • 1,330
  • 1,459
Tim Galvin
  • 25
  • 7

1 Answers1

3

To be honest, it may be better not to use that method at all, but to instead use BULK INSERT as specified here:

Handling Bulk Insert from CSV to SQL

It is quite simple though:

BULK INSERT dbo.TableForBulkData
FROM 'C:\BulkDataFile.csv'
WITH
(
FIELDTERMINATOR = ',',
ROWTERMINATOR = '\n'
)

If you're doing it through C#, then you can use the SqlBulkCopy library, or if you need to do it from command line, you can always use BCP.

Note, the method you're currently using is up to 10 times slower:

QUOTE FROM ARTICLE:

Data can be inserted to the database from a CSV file using the conventional SQLCommand class. But this is a very slow process. Compared to the other three ways I have already discussed, this process is at least 10 times slower. It is strongly recommended to not loop through the CSV file row by row and execute SqlCommand for every row to insert a bulk amount of date from the CSV file to the SQL Server database.

Community
  • 1
  • 1
KaraokeStu
  • 758
  • 7
  • 17
  • Thanks for that. I will consider this in future. What I'm really after is the answer to the specific questions. Does creating the table in this way use a large amount of RAM or diskspace. Does SQL handle the SELECT INTO on a line by line basis or doe it need to reach the end of the input data before creating the output table? – Tim Galvin Jun 04 '13 at 14:05
  • SQL has to compile the whole query, so the memory involved in loading the CSV file and then processing the query will be huge. Far better to stream the file in. – KaraokeStu Jun 04 '13 at 14:16
  • Does the same thing appy to an update query? if I update a table by setting column a = column a * 2, will I get better performance by partitioning the data? – Tim Galvin Jun 04 '13 at 14:24
  • I'm not 100% certain, the only way to find out would be to run some tests – KaraokeStu Jun 04 '13 at 14:38
  • 2
    The answer do your question is irrelevant. I am using multiple threads with bulk insert and I am inserting around 65000 rows PER SECOND into a smaller sized SQL Server. Unless you come even close - whatever you change, bulk insert is the ONLY proper solution. – TomTom Feb 08 '16 at 18:40