2

Background

We have a VB.NET application that loads text files into database tables. Text files can be in a variety of formats (Flat file (fixed column width), character delimited) and can have complex structures (e.g. variable number of columns). The format and structure of the file being imported is known in advance.

Before commencing file import, the application loads the "file format/structure" configuration settings. The application, then, reads the file and builds an SQL string (INSERT( [col list] ) VALUES( 'value1', 'value2', .. ), ..), which is then sent to the target table.
Update: Because all data comes as text (from a text file) the application also inserts it as text into a database, letting database do all the implicit conversion work. Having said above, there is some formatting being done (e.g. American style 06-05-2016 would be converted to 2016-06-05) based on configuration settings for a give type of file.

Target tables columns can have non-VARCHAR data types e.g. INT, DATETIME, DECIMAL( x, y ) etc. So inserting string data causes implicit VARCHAR->other data type conversions.
It is of note that SQL Server is able to convert empty strings to default values for certain types (the most notable examples are '' -> '1900-01-01' are '' -> '0')

This application is (was) a bit of a spaghetti monster, which I had to untangle before I could make my changes. As part of this "untanglement process" I was looking to replace SQL string building with SqlBulkCopy (using DataTable as cache object).

Question

SqlBulkCopy seems to first convert strings into CLR types before sending the data over to SQL Server. CLR types have different rules for implicit string to non-string type conversions so my data cannot get imported as-is (even though it imports just fine using the above SQL String building approach).

Is there a way to work around this?

What I have found so far: This question mentions custom “clean up” functions may need to be implemented. SqlBulkCopy - The given value of type String from the data source cannot be converted to type money of the specified target column
IMO custom "clean up" / coversion functions solution to replicate SQL Server behaviour is "akward".

I have also tried to play around with DataTable column types but to no avail:

SQL Table is declared as follows:

CREATE TABLE LoadTable( IntVal INT NULL, BitVal BIT NULL, DateVal DATETIME NULL, DecVal DECIMAL( 12, 3 ) NULL )

.NET Code:

'DTColumns(1) = New System.Data.DataColumn("BitVal", GetType(SqlTypes.SqlString))
'DTColumns(1) = New System.Data.DataColumn("BitVal", GetType(System.String))
DTColumns(1) = New System.Data.DataColumn("BitVal")
CacheRow.Item("BitVal") = "1"
BulkCopier.WriteToServer(_DataCache)

But always get Exceptions telling me that cannot convert value to Boolean:

The given value of type String from the data source cannot be converted to type bit of the specified target column.
Failed to convert parameter value from a String to a Boolean.

Another one:

Failed to convert parameter value from a SqlString to a Boolean.

Implicit default values do not work either (do NOT confuse with DEFAULT CONSTRAINT):

DTColumns(2) = New System.Data.DataColumn("DateVal", GetType(SqlTypes.SqlString))
CacheRow.Item("DateVal") = ""

On the other had, this works in SQL:

INSERT LoadTable( IntVal, BitVal, DateVal )
SELECT '', '', ''
/* Result
IntVal  BitVal  DateVal
0       0       1900-01-01 00:00:00.000
*/

It clearly proves that SqlBulkCopy internally converts an incoming data type to a CLR equivalent destination type.

Bonus question: anyone knows the reason why SqlBulkCopy must convert everything to a CLR type? Seems strange given it is writing data to a non CLR target.

Clarification: I fully understand that data types have to be converted (implicitly or explicitly). I am looking for a solution to convert strings to other types in exacly the same way as SQL Server does. My problem is that .NET does it differently, so simply using Integer.Parse (or equivalents) will fail in some cases, where in SQL the same conversion succeeds. I have provided a few examples where such conversions fail but there are plenty more that I know of (and still more that I don't).

Community
  • 1
  • 1
Alex
  • 4,885
  • 3
  • 19
  • 39
  • You could import to a staging table first consisting of all `VARCHAR` columns, then execute a stored proc that imports to the target table and performs all the conversions explicitly. – Chris Pickford Jun 23 '16 at 10:15
  • Just to be sure I understand the issue - you provide a "1" and want it into a BIT or varchar column? – Allan S. Hansen Jun 23 '16 at 10:17
  • @Chris Pickford, thanks for reply. I could but that would require creating "custom" staging tables / procedures for every file type OR lots of dynamic code. I should also mention that this program loads to multiple different servers / databases. – Alex Jun 23 '16 at 10:18
  • @Allan S. Hansen, one is supposed to go into a BIT column but it comes as string from a text file. There are also "problems" with empty string to default value conversions ('' -> '1900-01-01' and '' -> '0') which do not work in .NET. – Alex Jun 23 '16 at 10:20
  • For the "default" values - you most likely need to handle themselves in your datatable, so an empty string gets mapped to a value you want. Also - have you tested having "true" as a value instead of 1? – Allan S. Hansen Jun 23 '16 at 10:28
  • SqlBulkCopy doens't make any conversions. You'd get the same errors if you tried to directly execute an `INSERT` command. The text is parsed on SQL Server's side uing your collation. Parse your text data to the proper types (ie DateTime, int, boolean) and pass a strongly typed datatable or reader to SqlBulkCopy. Otherwise you depend on SQL Server guessing and performing a correct parse – Panagiotis Kanavos Jun 23 '16 at 10:31
  • @Panagiotis Kanavos, RE "You'd get the same errors if you tried to directly execute an INSERT command. ": you don't. Try it for yourself. I have updated my question with SQL Example. – Alex Jun 23 '16 at 10:34
  • @Allan S. Hansen, I understand that I can write my own conversion code to replicate SQL Server string to other data type behaviour (which includes a large number of peculiarities) see ("IMO custom "clean up" / conversion functions solution to replicate SQL Server behaviour is "awkward".) I am looking for a work around. Perhaps I missed a white elephant in front of me. – Alex Jun 23 '16 at 10:40
  • @Allan S. Hansen, I am probably calling it by the wrong name. By default value I mean conversion of empty string to a value '' -> '1900-01-01' in another data type. I do NOT mean default constraints. – Alex Jun 23 '16 at 10:48
  • @Alex the SQL statement has nothing to do with bulk copy, it's just a SELECT. You deal with completely unrelated data, that aren't even similar to those that caused the problem - an empty string isn't a value of "1". A valid example would be to create a SqlCommand with an `INSERT`statement that accepted the strings as parameters – Panagiotis Kanavos Jun 23 '16 at 11:14
  • Looking at the stack trace for the error - the "magic" happens way down the engine; `SqlClient.SqlBulkCopy.ConvertValue` -> `SqlClient.SqlParameter.CoerceValue` -> `Convert.ChangeType`. Without reflecting, I'd guess it's to convert CLR datatypes to SQL Datatypes, and that's where it also breaks for you - so I think you're out of luck and you'll have to do more conversion yourself in your datatable to have the correct types (I've always worked with the correct types myself, so I'm not aware of a workaround/hack) – Allan S. Hansen Jun 23 '16 at 11:17
  • To put it another way - SqlBulkCopy is used by a *lot* of people without problems. It *isn't* complicated. Just make sure your data has the same schema as the table, ie *don't* send strings if the table expects bits. Send bools, dates, integers, decimals – Panagiotis Kanavos Jun 23 '16 at 11:18
  • @Allan S. Hansen, so just to clarify in .NET there is no way of converting strings to other types SQL Server way? – Alex Jun 23 '16 at 11:24
  • @AllanS.Hansen SqlBulkCopy doesn't execute INSERT statements, it sends a stream of data with minimal logging. A BULK operation should *not* perform any conversions. Therefore, the value type *has* to match the target. Avoiding the conversion is also a lot faster – Panagiotis Kanavos Jun 23 '16 at 11:24
  • @Alex .NET has no problem. And conversions *do* work. Avoiding conversions though is just good database development practice. The way to fix this is well understood and *fast*. BTW '1900-01-01' is a locale-specific date for SQL Server. The locale independent format is `19000101` for dates. The ISO 8601 is for datetimes (ie `1900-01-01T00:00:00`). Anything else *is* subject to locale conversion *in SQL Server itself* – Panagiotis Kanavos Jun 23 '16 at 11:25
  • @PanagiotisKanavos Look at the stacktrace for the error. It goes through the chain I wrote before being sent to the server. – Allan S. Hansen Jun 23 '16 at 11:28
  • @Alex Well, I would think you might be able to override a lot of type based conversion logic, but you're much better off simply building a proper DataTable which also gives you tools for validating before sending to server etc. – Allan S. Hansen Jun 23 '16 at 11:32
  • @AllanS.Hansen actually I checked [the source](http://referencesource.microsoft.com/#System.Data/System/Data/SqlClient/SqlParameter.cs,900) (there is no call stack in the question) but I don't really care. I *do* use SqlBulkCopy to load many thousands of rows at a time and I *do* make sure to make any conversions in advance because I *don't* want to deal with conversion errors. In a SSIS script with dozens of transformations, one less conversion to worry about *is* important – Panagiotis Kanavos Jun 23 '16 at 11:35
  • @PanagiotisKanavos If you check your own link you'll see calls to `Convert.ChangeType` - it is there it throws exceptions, which provide the stacktrace I mentioned, from when I ran code – Allan S. Hansen Jun 23 '16 at 11:38
  • @AllanS.Hansen yes, that's exactly what I said. And I also said it doesn't matter, it's a bad idea to pass strings and expect conversionts to "just work". It would be a bad idea even if one used a SqlCommand to perform a single Insert – Panagiotis Kanavos Jun 23 '16 at 11:39
  • @PanagiotisKanavos Not arguing that it's a good thing to do; it was your "no conversion" statement I contested. – Allan S. Hansen Jun 23 '16 at 11:41
  • No conversion to SqlDbTypes. And there is *none*. It's from one .NET type (the source) to the .NET type of another (the target). The same happens in SSIS - you *have* to convert rows to the correct target type before sending it to the destination table. – Panagiotis Kanavos Jun 23 '16 at 11:43
  • @Allan S. Hansen, PanagiotisKanavos reading all of the above, I am wondering if my question was misunderstood a little. I fully understand that data types have to be converted (implicitly or explicitly). I am looking for a solution to convert strings to other types in exacly the same way as SQL Server does. My problem is that .NET does it differently, so simply using Integer.Parse (or equivalents) will fail in some cases, where in SQL the same conversion succeeds. None the less thanks for taking the time to help. – Alex Jun 23 '16 at 12:27

2 Answers2

3

No, there is no (direct) way to work around this and keep the bulk copy. SqlBulkCopy contains code that analyzes the target table and performs its own conversions, and if these are improper according to its rules, it will complain. There is no flag or setting that will convince it not to do this. Source: the source.

As to your bonus question, why does it need to do this? Well, it doesn't, really. It could just send over the data, with exactly the types you define, and have SQL Server figure it out. It could simply send over, say, an SqlInt32 as an INT, and then have SQL Server figure out that you actually want 2 converted to the BIT value 1. The protocol allows for this, but SqlBulkCopy insists on having its ducks in a row before it passes the data on to the server. This isn't completely unreasonable, because the alternative is ignoring the target types and using conservative and overly broad types to convert to before sending over data (all strings would need to be NVARCHAR(MAX), all DateTime as DATETIME2, just in the off chance the target column needs to fit the whole precision), which reduces performance, or else requiring the programmer to specify, down to the exact length and scale, what the SQL types are supposed to be, which is tedious, error-prone, and would yield the obvious question "why doesn't it just ask the server"? And, well, that's what it does now. To do what you want, it would need a separate mode where it checks if you're "really sure" about using a different exact data type than the target, just in case you wanted to leave it to SQL Server do the conversion for you. I guess that feature just didn't make the cut.

Mostly it's considered a Good Thing to do your own conversions to the exact data type in advance. Assuming that "SQL Server knows best" is terrible practice to begin with, because the conversions SQL Server applies are only partially documented, language-dependent, and often downright weird. The example you cite, that '' is converted to 1900-01-01, is a clear argument against relying on them. Rather than assuming your application definitely needs bug-for-bug compatibility, it's probably better to review if you can switch to a properly typed data model, with the few exceptions that may be implicitly relied on explicitly implemented. Maybe your application is counting on SQL Server converting True to the BIT value 1. Maybe it's even relying on it converting 2 to 1. Maybe it even needs the conversion from -,. to the MONEY value 0.00. Maybe, but it probably doesn't, and if it does, it probably shouldn't.

Here endeth the sermon. What if for some ungodly reason you needed this anyway?

First of all, you could simply not use SqlBulkCopy. Generating a huge INSERT statement is very inefficient (the T-SQL parser does not like big statements), but you could issue a lot of individual (parameterized!) INSERT statements and wrap these in a transaction. This is less efficient than a bulk copy, but still far more efficient than individual inserts outside a transaction or ginormous statements, and may be good enough for your purposes.

You could create a staging table (but not a #temporary table) with the desired columns and make all columns (N)VARCHAR(MAX). Then you could SqlBulkCopy into that, and then have SQL Server do the conversions from this table. This can be made to work even if you don't know the columns in advance, with the help of some dynamic SQL. The main objection is that you need additional space for the staging table, and the final update/insert operation isn't bulk logged. In short, this is quite inefficient. It may very well be less efficient than doing the inserts directly, despite the bulk copy step.

You could have your application spit out a flat file with the desired columns and all values in string format, then programmatically invoke bcp. Of course this requires installing bcp(it can be installed separate from SQL Server, as part of the Command Line Utilities package) and you may lose a chunk of performance writing the intermediate files, but it will probably still beat generating huge INSERT statements or intermediate tables. As long as you specify that everything is a (N)VARCHAR in your input, bcp will dutifully leave the conversions to SQL Server.

Last but not least, for completeness, you could conceivably write your own code to wrap around the native bulk copy functions, which don't have this restriction. But that would be quite a bit of work, and all that interop is also not good for performance. SqlBulkCopy doesn't wrap around these functions -- it's a from-scratch implementation of the bulk load operation of TDS, also known as the mysterious INSERT BULK statement. I recommend writing your own implementation of that even less, but now that .NET is open source, you could (say) fork .NET Core and have your own version of SqlRawBulkCopy. With blackjack and hookers.

Jeroen Mostert
  • 27,176
  • 2
  • 52
  • 85
  • Many thanks for posting this comprehensive answer. I searched for more info re "Generating a huge INSERT statement is very inefficient ...." and found this comprehensive article: https://www.simple-talk.com/sql/performance/comparing-multiple-rows-insert-vs-single-row-insert-with-three-data-load-methods/ . Reading this article leads me to believe that the current method used by my app is not bad at all (we are using 2008 R2), though tuning will be necessary on case by case basis. I was surprised by the TVP results though (previously I read TVP was only marginally slower than BulkInsert). – Alex Jun 24 '16 at 02:17
  • One note for future reference is that "Table Value Constructor" (`VALUES()()`) is llimited to 1000 rows as per https://msdn.microsoft.com/en-us/library/dd776382.aspx – Alex Jun 24 '16 at 02:24
  • Just to clarify: the app already uploads files in multiple "batches" as some files are 500MB+. Rows are cached until "MaxRowsToCache" is exceeded at which point SQL string is buit and sent to the server. From your answer and the link I have found I can see that for each file type I must tune "MaxRowsToCache". – Alex Jun 24 '16 at 02:58
  • RE "sermon": This is how I have inherited it. Changing is hard: as you say, SQL Server conversion rules are poorly documented, so for me to replicate this behaviour in .NET is very risky. At the same time I do not fully buy into "data has to satisfy .NET validation/conversion limitations". This data is consumed by batch processing logic in SQL Server (.NET is only a minor transit point) and therefore has to meet SQL Server rules but I fail to see a reason why it needs to meet .NET rules. When you use SQL Server native import (e.g. bcp) SQL Server rules are used. – Alex Jun 24 '16 at 03:33
  • @Alex: that's a great article. I'll delve into it when I have the time, to see what parts of my answer need to be revised. I've had bad experiences with big statements in the past, but I've never had to work around it with anything other than bulk insert. – Jeroen Mostert Jun 24 '16 at 07:42
  • @Alex: You shouldn't think of conversion as "rules" that have to be "met". Conversion is a process. .NET doesn't have a "rule" that the empty string "can't be converted" to 1900-01-01, it's just that this particular (dubious) conversion is present only in SQL Server. It's not like, say, SQL Server supports particular values of columns that .NET can't pass at all because of limitations. From the facts that you're using SQL Server, and that strings have to be converted to typed values at some point, it does not follow that the conversion process of SQL Server is the one to use. – Jeroen Mostert Jun 24 '16 at 07:46
  • @Alex: note that if you pass *typed* columns, and those types match the target types of the database (`Boolean` for `BIT`, `Integer` for `INT`, etcetera), no conversion happens. Given how flexible things like `Int32.TryParse` are, as opposed to SQL Server's implicit conversions, it's odd to complain that you're "stuck" with .NET! Just about the only argument left is indeed "my app started off underspecified and now I don't want to break stuff", which is a valid concern, but not a very common one. – Jeroen Mostert Jun 24 '16 at 07:53
1

I know this is an old question but hopefully this helps someone, I found this IBM Q&A ticket.

You can create a schema.ini file in the same folder as the file you're reading from, so you can force the SQLBulkCopy to handle the fields as strings and let SQL handle the conversions

Example:

[FILENAME.csv]
Format=Delimited()
Col1=COL1NAME Text Width 100
Col2=COL2NAME Text Width 100
...
DonovanClayton
  • 106
  • 2
  • 8
  • Seeing as that's from IBM, who produce the DB2 database, I suspect that it does not apply to SQL Server, from Microsoft – Andrew Morton Sep 27 '21 at 16:57