39

I'm generating some sql insert statements from a bunch of text files.

These text files are generally user input data. I would like to sanitize this data so that it's not going to break the insert statement.

For example, some of the input data, people have used the word Don't. The "'" in don't will lead the sql statement to think the string has ended and therefore cause an error.

Is there any .NET method I can call to kind of convert all of these characters to either escape codes or safe characters?

Diskdrive
  • 18,107
  • 27
  • 101
  • 167
  • possible duplicate of [Creating safe SQL statements as strings](http://stackoverflow.com/questions/293254/creating-safe-sql-statements-as-strings) – driis Apr 03 '11 at 10:24
  • 1
    Use parameterized queries (see linked question). – driis Apr 03 '11 at 10:24
  • 3
    The answer there says to input the data into parameters. I'm simply trying to generate the script via looking at the text files. My program isn't actually going to call the database, just spit out the scripts. – Diskdrive Apr 03 '11 at 10:25
  • Not sure how your reply addresses the recommendation that you use parameterized queries. The scripts you are generating would simply be scripts that use parameterized queries--you don't actually have to call the database. – Tim Apr 03 '11 at 11:50
  • @Tim: "simply be scripts that use parameterized queries" - how? What does an `.sql` script file look like with parametrized queries? Wouldn't the parameter values have to be escaped again when they are written into that file? – O. R. Mapper May 08 '17 at 12:52

2 Answers2

37

There is only a single character you have to escape: ansi 0x27, aka the single quote:

safeString = unsafeString.Replace("'","''");
Andomar
  • 232,371
  • 49
  • 380
  • 404
  • 9
    This assumes that the OP is quoting all values in their `insert` statement though. If they are generating a string like this `INSERT INTO T1(numeric,string) values (1,'some string')` then it won't help if they have some unexpected bad data in the first column. (e.g. `'1,2); DROP TABLE ...'` – Martin Smith Apr 03 '11 at 12:51
  • 3
    @MartinSmith Then use int.TryParse for the first parameter to ensure its a clean integer. – Sellorio Jan 13 '16 at 01:00
  • What about commas in an insert statement? – Johan Aspeling Aug 18 '16 at 08:38
  • This is enough for me. I included it in my micro ORM (EntityLite) for very large `IN` predicates, to workaround the 2100 SQL Server max number of parameters. I hope there is no more characters to take care of. I think I was too much paranoid with SQL injection. – Jesús López Apr 26 '18 at 15:53
  • @JohanAspeling a comma within single quotations should not cause a problem. But if it occurred where you did not expect a string then the other comment about TryParse for a numeric value would take care of it – StayOnTarget Feb 20 '19 at 12:16
  • Lol is this really the only think to take care of ? In this case people really are too paranoid x) Parametrized queries are painful to maintain and don't look nice – Alexandre Daubricourt Jun 29 '19 at 14:13
  • This doesn't sanitize the data as per the OP if the sproc is using dynamic SQL. Consider https://stackoverflow.com/questions/4102387/how-to-cleanse-dynamic-sql-in-sql-server-prevent-sql-injection – Chris HG Feb 10 '20 at 15:42
30

Don't sanitize your strings. Use parameterized queries instead, as they handle all sanitization.

You don't specify which database you are using, so I assume it is MS SQL Server. Microsoft has an article on the official ASP.net website about this. Also see MSDN for SqlCommand.Parameters and the AddWithValue method.

Michael Stum
  • 177,530
  • 117
  • 400
  • 535
  • 3
    +1 there is no "safe" way to concatenate together SQL statements - using **parametrized queries** (**ALWAYS!**) is the only viable way to go – marc_s Apr 03 '11 at 11:01
  • 10
    @marc_s: Parameterized queries are only a partial solution. They don't protect against SQL injection into a dynamic SQL query, javascript injection into a varchar field, parameter injection into an html label, etc. There is no "silver bullet" solution to injection. – Andomar Apr 03 '11 at 12:05
  • @Andomar That's a different problem though: Parameterized Queries offer 100% protection against SQL Injection, that is "Injection of unwanted SQL into another SQL Statement". The other side is Output Encoding, aka. HTML Encoding all user data that is output into a website to prevent stuff like XSS. That is an entirely different problem, one that is much harder. I'm not sure what you mean by "dynamic SQL query" - even when building a query with StringBuilders you can easily add parameters? (Parameterized Queries are NOT stored procedures after all, they can be dynamically generated) – Michael Stum Apr 03 '11 at 21:40
  • 1
    See [this example](http://www.sommarskog.se/dynamic_sql.html#SQL_injection) for how dynamic SQL enables SQL Injection, even though you use a .NET parameterized query – Andomar Apr 04 '11 at 04:16
  • @Andomar I see. That seems to be a rather unusual way of doing it though: Why would anyone use T-SQL and EXEC if they are already coding in .net? In .net I would just do `if(!string.IsNullOrEmpty(custName)){ sb.Append(" AND customerName = @custName"); cmd.Parameters.AddWithValue("@custName",custName); } cmd.CommandText = sb.ToString();` and be done with it. – Michael Stum Apr 04 '11 at 13:39
  • @Andomar I think I understand it now. Indeed, you are right. If your Stored Procedure is dynamically concatenating strings, then parameterized queries won't do much. – Michael Stum Apr 05 '11 at 23:56
  • DO sanitise untrusted input because of persistent XSS for one. – jhsowter May 01 '14 at 12:13
  • @jhsowter Not on the database side. XSS is a much bigger problem and should be handled in an appropriate layer - don't ever put it in the database to begin with, and never render out anything from the database without encoding it. – Michael Stum May 02 '14 at 04:31
  • 1
    @MichaelStum I have thought a bit more and I agree you should validate the string for XSS and then store it if it's safe. That is, validate, but do not sanitise. Let a framework sanitise the string for you. – jhsowter May 02 '14 at 09:47
  • @MichaelStum Unfortunately, you can't use a parameterized query with the "ORDER BY" clause. (At least in Sql Server) – C. Tewalt May 21 '15 at 19:00
  • @marc_s (& Mr. Stum) I understand that parameterized queries are always preferred, but that's also just offloading the problem -- *somebody's* writing the code that does that cleanup within the parameterization ;). If for no other reason than for a thought experiment, can you explain why [Andormar's answer](http://stackoverflow.com/a/5529426/1028230) *wouldn't* work for string params in, eg, the `{0}` in `SELECT * FROM Table1 WHERE Name='{0}'`? Or TryParse for an int? Etc. What makes conventional use cases too complicated for the general dev public? – ruffin Apr 21 '17 at 15:50
  • 1
    The OP says they want to write the `INSERT` statements into SQL script files. It's not clear from this answer how parametrized queries would work in those. Once the parameter values are written into the script files, they still have to be escaped in some way. – O. R. Mapper May 08 '17 at 12:56
  • How do you use parametrerized queries when you want to use `thecol like '%foobar%'`? – Anders Lindén Aug 18 '17 at 14:14
  • 1
    @marc_s, There is always (almost always) an exception to every rule. When you use SQL Server and you have more than 2100 elements in an `IN` or `NOT IN` predicate, you cannot use parameters, you need to resort to literals. – Jesús López Apr 26 '18 at 16:02
  • @JesúsLópez: if you have more than 2100 elements for an `IN` clause - you have quite a few other problems to deal with first...... – marc_s Apr 26 '18 at 17:10
  • @marc_s. Most of the time we can use a subquery. But in some extreme cases we have no better alternative. – Jesús López Apr 27 '18 at 12:10
  • @JesúsLópez: if you really need 2100+ elements - why aren't those stored in a separate table (possibly a temp table) so you can just easily **join** - instead of having such a monster of an `IN` clause?? There's always a better alternative than 2100+ elements in an `IN`! – marc_s Apr 27 '18 at 14:31
  • @marc_s. Yes, I see this sound weird.But trust me, I saw rare scenarios where It make sense. – Jesús López Apr 27 '18 at 16:23
  • 1
    @marc_s. The large `IN` scenario is not the only one you need literals. Think about a bunch of insert, update and delete statements in a single batch. It's very easy to have more than 2100 values is this scenario, in this case you need literals. – Jesús López Apr 27 '18 at 16:28