1

I have requirement to insert around 2500-3000 rows of data inside a table at a time. I need to do this for both MySQL and SQL Server. So the possible solutions I found is

  1. Writing cfquery inside cfloop.
  2. Using cfloop inside cfquery to generate sql statement like INSERT INTO tbl VALUES((),(),()) . But the problem with this syntax is it can only insert 1000 records at a time not more than that. I need more.

Can anyone suggest me Whats the best way to handle this kind of situations?

Rhys Jones
  • 5,348
  • 1
  • 23
  • 44
user3427540
  • 1,162
  • 1
  • 14
  • 30

5 Answers5

5

When you are dealing with thousands of records, small incremental delays start adding up quickly. Inserting this number of rows one by one will take considerably more time than issuing a BULK INSERT statement for several reasons like increased network traffic, internal database actions, etc.

I am going to assume that you are receiving your data in a file. In that case, these are the general steps you would need to take:

  1. Validate the data. VERY IMPORTANT. Even if you are receiving data from a trusted source in a trusted environment, files can easily be corrupted, truncated, or damaged in many ways. They can also contain discrepancies due to the small but sometimes significant differences between server versions (SQL Server 2000 vs 2008) or vendors (your source might be an Oracle database).
  2. Issue a BULK INSERT statement, assuming you are using SQL Server 2005 or newer. I think there are some differences in the way the command works between 2005 and newer versions, so check the syntax before trying it.
  3. Depending on your configuration and anything else you might need, you may want to update the index(es) on the table, or even recreate them. Again, this depends on how big the table is and your needs. This part of the process will potentially take the longest and may even take the table offline but your users will be very grateful that you did it.

It pays to plan the whole insertion process carefully so that you do not disrupt normal business activities and continue to provide the best service and availability.

As a closing thought, you should run tests using different methods. One-by-one vs bulk to make sure you are getting the best performance. I believe, however, that you'll always find that batches or bulk insertions are the way to go. The reason they added that capability into the Transact-SQL language is performance. This will save network bandwidth, resources, and time.

  • 5
    About that validation comment, that's one of the benefits of using staging tables. – Dan Bracuk Feb 13 '15 at 14:47
  • 1
    @DanBracuk Yes, the best strategy is to always use as many existing validation techniques before implementing one's own validation. – Security Aficionado Feb 13 '15 at 14:55
  • 1
    Agreed on Staging tables. They have a lot of advantages over individual inserts. It is a little more work to set up than the individual approach, but it is well worth it for the processing control it provides (validation, transactions, etc..). (Side note, BULK INSERT should be supported in SQL 2000+ ). – Leigh Feb 13 '15 at 15:36
  • 2
    And Excel files are the most likely ones to need unexepcted validations. people will change columns around, there are no data validation controls as data inserted into a database might(should) have. One should not insert an Excel spreadsheet directly to production data in a database. Highly recommend a staging table with clean up. – HLGEM Feb 13 '15 at 15:46
3

In SQL Server (at least 2008 and above) you can insert multiple rows at one time through a single INSERT statement:

INSERT INTO mytable (column1, column2) VALUES (value1, value2), (value3, value4);

I don't know if there is a limit (such as 1000) on the number of rows that can be inserted at once, but I agree with @RhysJones that you might want to limit to, say, 100 rows at a time in case an error is encountered - this will make it much easier to spot the error and correct it.

Since you're using ColdFusion, my recommendation would be to loop over the rows of the external file 100 at a time:

 <cfquery name="q_insert_rows">
    INSERT INTO mytable ( column1, column2, column3 )
    VALUES
    <cfloop query="external_data" startrow="#startrow#" endrow="#endrow#">
        ( <cfqueryparam cfsqltype="CF_SQL_VARCHAR" value="#value1#" />
        , <cfqueryparam cfsqltype="CF_SQL_VARCHAR" value="#value2#" />
        , <cfqueryparam cfsqltype="CF_SQL_VARCHAR" value="#value3#" /> )
        <cfif currentRow NEQ endrow> , </cfif>
    </cfloop>
 </cfquery>

The <cfquery> block above should be put inside a loop:

<cfloop from="1" to="#external_data.recordCount#" step="100" index="startrow">
    <cfset endrow = min(startrow + 99, external_data.recordCount) />
    ...
</cfloop>

If you want the whole thing to fail if one insert fails, then wrap the above loop in <cftransaction>.

David Faber
  • 12,277
  • 2
  • 29
  • 40
  • You cannot use cfqueryparam within cfsavecontent - only within query tags. – Leigh Feb 13 '15 at 14:57
  • This approach may push you over sql server's maximum parameter limit. – Dan Bracuk Feb 13 '15 at 15:12
  • @Leigh, thanks for the pointer. I've modified the answer accordingly. – David Faber Feb 13 '15 at 15:16
  • No problem. BTW, there is a limit, [1000 for SQL Server](https://technet.microsoft.com/en-us/library/dd776382%28v=sql.110%29.aspx). Other dbms' may have different limits. Though breaking it up into batches would work, I think bulk loading would be a more appropriate tool in this case ie BULK INSERT (sql server), LOAD DATA (mysql), ... – Leigh Feb 13 '15 at 15:49
3

You've tagged this with MySQL, so I'm presuming that you're using MySQL. I've had experience with this sort of issue as well, where we were importing our HR information from an external database. I found that it was significantly faster to validate the data, store it in a CSV file, and import that using the LOAD DATA INFILE syntax.

Paul Rowe
  • 778
  • 3
  • 10
1

If possible, use a more appropriate tool to solve this problem instead of hammering away at it with ColdFusion.

SQL Server has the ability to directly import the contents of a spreadsheet.

Once connected to Sql Server 2005 Database, From Object Explorer Window, right click on the database which you want to import table into. Select Tasks -> Import Data. This is a simple tool and allows you to 'map' the incoming data into appropriate table. You can save the scripts to run again when needed.

You should be able to create a function in SQL Server that you can call from ColdFusion, passing in the location of the spreadsheet on your server (in a non-web-accessible folder, of course).

For MySQL, as Paul Rowe already stated, use the LOAD DATA INFILE syntax. Another SO answer suggests using the Windows based HeidiSQL to craft the statement via a UI to get the correct mappings.

Community
  • 1
  • 1
Adrian J. Moreno
  • 14,350
  • 1
  • 37
  • 44
0

Just don't use the multi-values version of INSERT. Use 1 INSERT statement per row, i.e.

INSERT INTO tbl VALUES(...);
INSERT INTO tbl VALUES(...);
INSERT INTO tbl VALUES(...);
Rhys Jones
  • 5,348
  • 1
  • 23
  • 44
  • multiple insert statements not working inside cfquery for mysql. – user3427540 Feb 13 '15 at 14:27
  • I'm not a MySQL guy but you might need the ; (semi-colon) statement terminators, SQL Server will accept these too. – Rhys Jones Feb 13 '15 at 14:31
  • Moreover for the time being I am using sql server. So is this the correct way to insert multiple rows at a time? Basically I am generating a string containing all the INSERT statements and finally passing the the string to `cfquery` to insert into the DB? – user3427540 Feb 13 '15 at 14:35
  • It's one way of doing it, and is very simple so compatible with a wide range of databases. I would be inclined to break it into batches, maybe do 100 rows at a time. As @DavidFaber mentions above, the "best way" to do it depends on a lot of factors that are outside the scope of this question. – Rhys Jones Feb 13 '15 at 14:38
  • 3
    By default, the MySQL JDBC driver prevents the execution of multiple to protect against sql injection. [`allowMultiQueries` must be enabled](http://www.petefreitag.com/item/357.cfm) first. *generating a string containing all the INSERT statements* You should be using cfqueryparam instead. It protects against sql injection and helps improve performance for statements executed multiple times (which yours are). If you must build a "string", use the cfscript version, which allows you to "parameterize" the sql string. That said, sounds like you'd be better off using your db's bulk loading tool. – Leigh Feb 13 '15 at 14:43