0

Is there a way to read Excel 2010/2013 files natively ?

We are importing Excel files into SQL Server and have come across a specific issue whereby it looks as though the Excel driver decides the type of a destination data column depends upon testing the contents of only the first 65K odd rows.

This has only just started happening within the past 3 weeks, before then we had managed to convince Excel of the error of its ways by a simple registry hack that forced it to read the entire set of rows.

The problem is that we have some datasets that contain, say 120,000 rows and these may have all numeric values for the first 80,000, then it will have some non-numeric yet vital information that we wish to retain.

Yes, the data is not correctly typed, we know.

Because the source data type has been determined by the Excel driver to be a float it promptly turns all our non-numeric values into NULLs - not very useful.

If there was some other way to read an Excel file not using the standard ODBC/OLEDB drivers that might help.

We have tried saving it into various other formats before importing but of course all these exports use the Excel driver which has the problem.

I think the closest we have got is to save it as XML (which is frankly huge at 800MB) and then shred it using standard xpath queries and some pretty dodgy workarounds to handle no doubt well-formed but still tricky variations on how column data is represented.

Edit: changed title to more closely reflect the issue

  • 1
    The import wizard just generates an SSIS package, so instead of running right away, save the package, open it, and fix the data types. A pain, I know, but it's the best way. – Jeremy Jul 31 '15 at 11:37
  • OK sorry should have been clearer, we don't use SSIS, its something that has been imposed on us, we have several thousand Excel workbooks to import so I'm going to be drawing a pension before I code them all unless I force the issue and use something like BIML – unclejimbob Jul 31 '15 at 11:55
  • can you convert to csv then import? – Jeremy Jul 31 '15 at 12:01
  • Another thought, I'm not sure that the driver actually cares what data type you call it at the destination it will still simply decide the data type of the source as usual, cast it to NULL for non-numerics and then unhelpfully cast to it your data type – unclejimbob Jul 31 '15 at 12:05
  • I was going to add a whole lot of things to the top of my original question, I see I should have. No, converting it to csv won't help because ta-da its uses the Excel driver to do the conversion :) The entire issue is that no matter what you do the first thing you meet up with is the Excel driver. – unclejimbob Jul 31 '15 at 12:06
  • xlsx is actually a zip file - rename to .zip and open it and its a bunch of XML files. You could try trawling through that. – Nick.Mc Jul 31 '15 at 13:26
  • and yes I can't beleive this issue still exists... its been around forever! – Nick.Mc Jul 31 '15 at 13:27
  • You should be able to convert the Excel to CSV with something like this: http://stackoverflow.com/questions/1858195/convert-xls-to-csv-on-command-line. No driver required here. Excel needs to be installed though – Nick.Mc Jul 31 '15 at 13:28
  • Zip file: Yep, I think that's where we are at in a roundabout sort of way but your approach looks more direct, useful and applicable, I'm going to run it by my prop-head on Monday – unclejimbob Aug 01 '15 at 04:21
  • I can't beleive this issue still exists: Kinda, there has always been a work around using the standard registry hack but just in the last few weeks my guess is one of the inumerable KBs has gone and stuffed it up. – unclejimbob Aug 01 '15 at 04:21
  • In our fevered imaginations we were thinking of some poor overworked and underpaid (a least I trust they are now) Microsoft employee gleefully coming across a performance issue with reading Excel workbooks and thinking 'Ya know, I bet nobody has thought of only reading the first few rows to determine the data type of a column, I mean _really_ why not speed this process up 10x and only fetch say, the first 65K - and while I'm at it what is the deal with this registry entry that allows someone to force an entire read of the file ? That can't be right I'll just fix it while I'm here" – unclejimbob Aug 01 '15 at 04:21
  • You should be able to convert the Excel to CSV with something like this: Yes, however this may just push the problem into another format, we will try it out with our minimal test data set when I get back to work on Monday, I guess at least we would then be able to use a bulk insert – unclejimbob Aug 01 '15 at 04:32
  • Nick thanks for the input, we're going with a variation of your suggestions, i.e. 'Save as XML spreadsheet 2003' and then parsing the content. – unclejimbob Aug 04 '15 at 08:43
  • Just a suggestion: If the nature of your data is tabular, you're just creating more work for yourself by using XML – Nick.Mc Aug 05 '15 at 06:46
  • Yes, unfortunately that is the nature of things, now we have a repeatable, reliable working solution, saving it as CSV will simply tempt someone in IT to use the no-brainer drag-and-drop option of importing it using OLEDB - and we're then back to square 1. – unclejimbob Aug 06 '15 at 07:47

1 Answers1

0

As well as the registry key, when connectting to your excel file have you tried setting the following:

;Extended Properties="IMEX=1"
  1. See here
  2. Also see this MSDN article
sarin
  • 5,227
  • 3
  • 34
  • 63
  • "As well as the registry key, when connectting to your excel file have you tried setting the following:" Yes See here Yes, you've just described the registry hack that doesn't work for large datasets Also see this MSDN article As above – unclejimbob Jul 31 '15 at 11:58