3

I am building a tool to automate the creation of an Excel workbook that contains a table and an associated PivotTable. The table structure is on one sheet, the data for which will be pulled from a database using another tool at a later point. The PivotTable is on a second sheet using the table from the previous sheet as the source.

I am using EPPlus to facilitate building the tool but am running into problems specifying the cacheSource. I am using the following to create the range and PivotTable:

 var dataRange = dataWorksheet.Cells[dataWorksheet.Dimension.Address.ToString()];

 var pivotTable = pivotWorksheet.PivotTables.Add(pivotWorksheet.Cells["B3"], dataRange, name);

This sets the cacheSource to:

<x:cacheSource type="worksheet" xmlns:x="http://schemas.openxmlformats.org/spreadsheetml/2006/main">
<x:worksheetSource ref="A1:X2" sheet="dataWorksheet" />

or within Excel, the data source is set to:

dataWorksheet!$A$1:$X$2

This works fine if the table size never changes, but as the number of rows will be dynamic, I am finding when the data is refreshed, data is only read from the initial range specified.

What I am want to do is to programmatically set the cacheSource to:

<x:cacheSource type="worksheet" xmlns:x="http://schemas.openxmlformats.org/spreadsheetml/2006/main">
  <x:worksheetSource name="dataWorksheet" />
</x:cacheSource>

or in Excel, set the data source to:

dataWorksheet

I believe it may be possible to do this by accessing the XML directly (any pointers on this would be most welcome) but is there any way to do this using EPPlus?

pnuts
  • 58,317
  • 11
  • 87
  • 139
  • 1
    Thanks for the comment; this is what I am attempting to set. EPPlus does not appear to have a method for adding the source data as a table; it seems it will only accept a static range. – BadgersArise Oct 23 '15 at 07:19

1 Answers1

1

It can be done but it is not the prettiest thing in the world. You can extract the cache def xml and edit it from the created EPPlus pivot table object but that will wreak havoc with the save logic when you call package.save() (or GetAsByteArray()) since it parses the xml on save to generate the final file. This is the result of, as you said, EPPlus not capable of handling a table as the source.

So, your alternative is to save the file with EPPlus normally and then manipulate the content of the xlsx which is a renamed zip file using a .net ZipArchive. The trick is you cannot manipulate the files out of order in the zip otherwise Excel will complain when it opens the file. And since you cannot insert an entry (only add to the end) you have to recreate the zip. Here is an extension method on a ZipArchive that will allow you to update the cache source:

public static bool SetCacheSourceToTable(this ZipArchive xlsxZip, FileInfo destinationFileInfo, string tablename, int cacheSourceNumber = 1)
{
    var cacheFound = false;
    var cacheName = String.Format("pivotCacheDefinition{0}.xml", cacheSourceNumber);

    using (var copiedzip = new ZipArchive(destinationFileInfo.Open(FileMode.Create, FileAccess.ReadWrite), ZipArchiveMode.Update))
    {
        //Go though each file in the zip one by one and copy over to the new file - entries need to be in order
        xlsxZip.Entries.ToList().ForEach(entry =>
        {
            var newentry = copiedzip.CreateEntry(entry.FullName);
            var newstream = newentry.Open();
            var orgstream = entry.Open();

            //Copy all other files except the cache def we are after
            if (entry.Name != cacheName)
            {
                orgstream.CopyTo(newstream);
            }
            else
            {
                cacheFound = true;

                //Load the xml document to manipulate
                var xdoc = new XmlDocument();
                xdoc.Load(orgstream);

                //Get reference to the worksheet xml for proper namespace
                var nsm = new XmlNamespaceManager(xdoc.NameTable);
                nsm.AddNamespace("default", xdoc.DocumentElement.NamespaceURI);

                //get the source
                var worksheetSource = xdoc.SelectSingleNode("/default:pivotCacheDefinition/default:cacheSource/default:worksheetSource", nsm);

                //Clear the attributes
                var att = worksheetSource.Attributes["ref"];
                worksheetSource.Attributes.Remove(att);

                att = worksheetSource.Attributes["sheet"];
                worksheetSource.Attributes.Remove(att);

                //Create the new attribute for table
                att = xdoc.CreateAttribute("name");
                att.Value = tablename;
                worksheetSource.Attributes.Append(att);

                xdoc.Save(newstream);
            }

            orgstream.Close();
            newstream.Flush();
            newstream.Close();
        });
    }

    return cacheFound;

}

And here is how to use it:

//Throw in some data
var datatable = new DataTable("tblData");
datatable.Columns.AddRange(new[]
{
    new DataColumn("Col1", typeof (int)), new DataColumn("Col2", typeof (int)), new DataColumn("Col3", typeof (object))
});

for (var i = 0; i < 10; i++)
{
    var row = datatable.NewRow();
    row[0] = i; row[1] = i*10; row[2] = Path.GetRandomFileName();
    datatable.Rows.Add(row);
}

const string tablename = "PivotTableSource";
using (var pck = new ExcelPackage())
{
    var workbook = pck.Workbook;

    var source = workbook.Worksheets.Add("source");
    source.Cells.LoadFromDataTable(datatable, true);
    var datacells = source.Cells["A1:C11"];

    source.Tables.Add(datacells, tablename);

    var pivotsheet = workbook.Worksheets.Add("pivot");
    pivotsheet.PivotTables.Add(pivotsheet.Cells["A1"], datacells, "PivotTable1");

    using (var orginalzip = new ZipArchive(new MemoryStream(pck.GetAsByteArray()), ZipArchiveMode.Read))
    {
        var fi = new FileInfo(@"c:\temp\Pivot_From_Table.xlsx");
        if (fi.Exists)
            fi.Delete(); 

        var result = orginalzip.SetCacheSourceToTable(fi, tablename, 1);
        Console.Write("Cache source was updated: ");
        Console.Write(result);
    }
}
Ernie S
  • 13,902
  • 4
  • 52
  • 79
  • Prior to reading this I tried amending the package and hit the exact problem you highlighted around the package save validation. The solution you have suggested provided a great alternative that worked really well and helped improved my knowledge; thank you very much. – BadgersArise Oct 26 '15 at 11:47