2

I am asking a question that is related to Execute multiple SQL commands in one round trip but not exactly the same because I am having this problem on a much bigger scale:

I have an input file with many different SQL commands (ExecuteNonQuery) that I have to process with a .net application. Example:

INSERT INTO USERS (name, password) VALUES (@name, @pw); @name="abc"; @pw="def";
DELETE FROM USERS WHERE name=@name; @name="ghi";
INSERT INTO USERS (name, password) VALUES (@name, @pw); @name="mno"; @pw="pqr";

All of the commands have parameters so I would like the parameter mechanism that .net provides. But my application has to read these statements and execute them within an acceptable time span. There might be multiple thousand statements in one single file.

My first thought was to use SQLCommand with parameters since that would really be the way to do it properly (parameters are escaped by .net) but I can't afford to wait 50msec for each command to complete (network communication with DB server, ...). I need a way to chain the commands.

My second thought was to escape and insert the parameters myself so I could combine multiple commands in one SQLCommand:

INSERT INTO USERS (name, password) VALUES ('abc', 'def'); DELETE FROM USERS WHERE name=@name; @name='ghi'; INSERT INTO USERS (name, password) VALUES ('mno', 'pqr');

However I do feel uneasy with this solution because I don't like to escape the input myself if there are predefined functions to do it.

What would you do? Thanks for your answers, Chris

Community
  • 1
  • 1
Shirky
  • 347
  • 2
  • 3
  • 14
  • And if I really have to insert the parameters myself is there a predefined escape function for any given database provider that I can use? – Shirky Jul 14 '11 at 19:49
  • I'll post my response as a comment, not an answer, because it's really more of an opinion. When batch processing, you *often* have to give up clean, predefined functions in favor of performance. If your goal really is performance and you think executing the the load on a single command really is the best way to get you there, then there's no shame in slicing and dicing the parameters yourself. – Michael Ames Jul 14 '11 at 20:10
  • Thanks for your answer. The problem that remains is that I have to make sure no SQL injection is possible. Is a simple String.replace("'", "''") to escape the ' chars really gonna work for all DB systems? – Shirky Jul 14 '11 at 22:13
  • Is SQL injection a true risk in your scenario? It sounds to me like your source data is coming from a controlled, internal source. (Correct me if I'm wrong.) Again, you're facing tradeoffs common to all batch processing. If you really think you're at risk of SQL injection attacks, you should use the out-of-the box parameterization approach, and execute it line-by-line. – Michael Ames Jul 14 '11 at 22:19
  • You are right, the data source is indeed internal... But I like to make sure that nothing bad can happen... And I just think there has to be a better way to do it. I believe in built-in functions ;-) – Shirky Jul 14 '11 at 22:28

2 Answers2

0

Interesting dilemma.. I would suggest any of these:

  1. Have control over sql server? create a stored procedure that loads the file and do the work
  2. Use sqlcommand, but then cache the parameters, and read only command type (delete, insert, etc) and the values from each line to execute. Parameter caching examples here, here, and here.
  3. Use multiple threads.. A parent thread to read the lines and send them over to other threads: one to do the inserts, another to do the deletion, or as many as needed. Look at Tasks
Rached N.
  • 148
  • 6
  • Hi, loading the file with a stored procedure is an interesting idea but then I also have to insert the parameters into the SQL Queries with the stored proc. – Shirky Jul 17 '11 at 09:18
  • Caching the parameters is not going to work since each of the statements could be different. And I would have to send one SQLCommand for every command in the file. Threading also isn't gonna help me since there can't be multiple SQLCommands that are executed on the same connection at the same time. But thanks for your help, if I felt more confident with stored procs I would perhaps use that. But since I don't know them well I think I'm going with the other answer. – Shirky Jul 17 '11 at 09:23
0

Assuming everything in the input is valid, what I would do is this:

  • Parse out the parameter names and values
  • Rewrite the parameter names so they are unique across all queries (i.e., so you would be able to execute two queries with a @name parameter in the same batch)
  • Group together a bunch of queries into a single batch and run the batches inside a transaction

The reason why you (likely) won't be able to run this all in a single batch is because there is a parameter limit of 2100 in a single batch (at least there was when I did this same thing with SQL Server); depending on the performance you get, you'll want to tweak the batch separation limit. 250-500 worked best for my workload; YMMV.

One thing I would not do is multi-thread this. If the input is arbitrary, the program has no idea if the order of the operations is important; therefore, you can't start splitting up the queries to run simultaneously.

Honestly, as long as you can get the queries to the server somehow, you're probably in good shape. With only "multiple thousands" of statements, the whole process won't take very long. (The application I wrote had to do this with several million statements.)

Jon Seigel
  • 12,251
  • 8
  • 58
  • 92
  • Hi, thanks for your answer :-) . I'm using MySQL but I don't know how high their limit is. But 200-300 is gonna work just fine... I'm gonna try it when I return to work. – Shirky Jul 17 '11 at 09:28
  • @Shirky: I assume you got it to work? What was the parameter limit, out of curiosity? – Jon Seigel Jul 19 '11 at 22:24
  • Yes, I got it to work, the parameter limit is beyond 40k for MySQL. That's when I stopped trying. – Shirky Jul 27 '11 at 15:32
  • Important note: For MySQL it is faster to create multiple DbCommands instead of one command with multiple statements. Apparently the MySQL driver just creates the commands, sends them and doesn't wait for the response (ExecuteNonQuery). I tried sending 16k statements with single DbCommands (20 sec) and using one aggregated DbCommand (40 sec) (using a MySQL server in the LAN). This is much faster than I expected. I am going to work with single commands. This finding doesn't necessarily apply to other SQL drivers! – Shirky Jul 27 '11 at 15:40
  • I tried again with other data and found the following processing times: 100 parameters per query: 6.2 sec (optimum), 1000 per query: 10 sec, no batch: 12 sec. It is barely worth the hassle of implementing it. But since I already have it I'll keep it. – Shirky Jul 27 '11 at 15:50