4

I have looked at most of the topics on this forum regarding similar questions but haven't found exactly what I am looking for.

I am trying to write a pipeline component for BizTalk 2013 R2 using C# to simply convert an incoming Excel 2010 .xlsx file to it's bare/base XML representation.

I do not want to run any templates against it or XLST transform it or anything like that. I simply just want to return the underlying XML representation of said spreadsheet as is.

It seems like this should be a very easy task but I can't figure out how to do it at all.

Everything I've found requires working with DataTables and looping through rows and cells (via OpenXML) to output a specific XML representation that is more human readable but that isn't what I want.

I want the actual Microsoft XML representation of that spreadsheet.

Any help would be greatly appreciated.

Dijkgraaf
  • 11,049
  • 17
  • 42
  • 54
Ritley572
  • 299
  • 4
  • 15
  • Unzip and grab the XML content – ErikEJ Feb 01 '15 at 15:51
  • There is no programmatic way to do it other than unzipping it? – Ritley572 Feb 02 '15 at 14:29
  • Use a .zip library to do the unzipping! – ErikEJ Feb 02 '15 at 16:36
  • Yes I understand I can do that, but I was kind of hoping I could just return the xml without having to unzip the files... I can get at worksheets and workbooks programmatically without unzipping via the OpenXML SDK, but don't see any way to use something like worksheet.OuterXml to return the xml of the worksheet I have loaded. Is there really no way to do that other than unzipping and returning the actual xml file? – Ritley572 Feb 02 '15 at 18:09
  • @Ritley572 You do understand that any library you end up using is going to have to *unzip the file*? If your sole intent is to grab the XML files inside, then using a library to read the file is just adding additional steps. – mason Feb 02 '15 at 19:25
  • @mason yes I didn't realize xlsx files were actually just zip files under the covers until I really looked into how to get the unaltered XML. I was operating under the assumption is was just one large XML file that the OpenXML library was able to access. Once I realized it was actually a series of linked XML files it made everything easier. Though I do believe the code listed below suits my needs better than using a zip library to unzip the file and then having to use the OpenXML to get at what I need anyway. Perhaps I should edit the original question, though I don't want to confuse anyone. – Ritley572 Feb 03 '15 at 20:10

1 Answers1

2

OK, figured it out without having to do any unzipping of the file.

If you use the SAX approach to loading the worksheet into an OpenXmlReader found here:

https://msdn.microsoft.com/en-us/library/office/gg575571(v=office.15).aspx

You can then use the reader to get the OuterXml like so:

using (SpreadsheetDocument spreadSheetDocument = SpreadsheetDocument.Open(filepath, false))
{
    WorkbookPart wbPart = spreadSheetDocument.WorkbookPart;

    OpenXmlReader reader = OpenXmlReader.Create(wbPart);

    while (reader.Read())
    {
        if (reader.ElementType == typeof(Sheet))
        {
            Sheet sheet = (Sheet)reader.LoadCurrentElement();

            WorksheetPart wsPart = (WorksheetPart)(wbPart.GetPartById(sheet.Id));

            OpenXmlReader wsReader = OpenXmlReader.Create(wsPart);
            while (wsReader.Read())
            {
                if(wsReader.ElementType == typeof(Worksheet))
                {
                    Worksheet wsPartXml = (Worksheet)wsReader.LoadCurrentElement();
                    Console.WriteLine(wsPartXml.OuterXml + "\n");
                }
            }
        }
    }
    Console.ReadKey();
}
Nikolay Kostov
  • 16,433
  • 23
  • 85
  • 123
Ritley572
  • 299
  • 4
  • 15