1

I know questions like this are around in stack and there are 3rd part libraries to do the trick but none of them is fixing my issue at the moment. So the issue.

I have an Excel workbook (.xlsx) with multiple sheets generated by another system. I have to read the data from this via SSIS and dump it to a SQL DB.

Now the issue is although the Excel sheet contains data and when I open manually it opens without any error and the data displays when I use a script task and use OLEDB connection to connect to the excel and open it up the connection is made successfully but when reading data the column names are not picked (I get F1, F2 likewise) and no data rows are read. I simply get a blank row and that's about it. I have tried with HDR= YES and NO and IMEX=1 and 0 but always the result is same.

Funny thing is if I open the excel sheet do some modification (like change a sheet name save and change back the sheet name and save and close) and after that I try to run the package the data gets picked without any issue (also I noticed that the file size increases from 164KB to 196KB). Now because of this what am trying to do is modify the the file a bit and save via code.

So the initial step I tried was through using Office.Interop.Excel and it works like a charm in my machine but on the server NO OFFICE so IT NO WORKS. And nope the IT guys are never going to install access engine or excel or anything there.

Then I tried via OpenXML and 3rd party library like NPOI and even via OLEDB connection to modify the file. in both NPOI and OLEDB methods the file got changed but still it didn't get picked up properly by the SSIS package (I noticed that the file size didn't change and remained at 164kb). In OpenXML it wasn't able to open the file and threw an error saying "the document cannot be opened because there is and invalid part with an unexpected content type".

So right now I am stuck with no proper method in sight and would appreciate any help in solving this either through c# code or any other SSIS method available. SSIS version am using is 2008.

Edit 1

So I noticed that the script task is able to read the data from the first sheet out f the multiple sheets but the other sheets are the problem. So somewhere the xml for these sheets are broken. Anyway I can copy the xml configs of the first sheet to other ones? Just a thought...

Edit 2 So the first sheet is of ContentType "application/vnd.openxmlformats-officedocument.spreadsheetml.worksheet+xml" while all the other sheets are of ContentType "application/xml"

Sam
  • 850
  • 1
  • 10
  • 20
  • 1
    This won't be a clear answer so posting as comment. xls files are zipped xml files. So unzip it then process the xml might work in your case. – MotKohn May 23 '17 at 03:35
  • In addition to MoKohns comment, the format is a standard so you can read up on it here: https://stackoverflow.com/questions/4886027/looking-for-a-clear-description-of-excels-xlsx-xml-format – rayzinnz May 23 '17 at 03:45
  • We use a tool by syncfusion and is works well https://www.syncfusion.com/products/file-formats/xlsio This is a standalone tool. If you are small business they have a free community edition which is handy – TrtlBoy May 23 '17 at 06:13
  • If you are getting an error *the document cannot be opened because there is and invalid part with an unexpected content type* then there is an invalid part. Why not fix your excel before trying out different solutions? OpenXML works fine on the servers, a lot of people use it – FortyTwo May 23 '17 at 08:01
  • @MatKohn and rayzinnz: Am not exactly sure how to process the xml. Seems I need to add somerelationship but not sure what exactly. TrtlBoy this is going to be a commercial license and well I might be the one to pay for that so not gonna work. FortyTwo: The file generation is out of my scope and it cannot be touched (unfortunately). – Sam May 23 '17 at 09:30

1 Answers1

1

Ultimately ended up using two libraries for this. The data was read without an issue by using exceldatareader (http://exceldatareader.codeplex.com/). Using this the data was read into a dataset easily and then it was written to a new Excel file using epplus (http://epplus.codeplex.com/). After that when the new excel file was read via the SSIS package data got picked without an issue. Hope this will help someone out there.

Sam
  • 850
  • 1
  • 10
  • 20