7

I would like to be able to efficiently retrieve a multi-dimensional array of formatted cell values from Excel. When I say formatted values, I mean I would like to get them exactly as they appear in Excel with all the cell NumberFormat applied.

The Range.Value and Range.Value2 properties work great for retrieving the cell values of a large number of cells into a multi-dimensional array. But those are the actual cell values (well at least with Range.Value2 is, I'm not quite sure what Range.Value is doing with respect to some of the values).

If I want to retrieve the actual text that is displayed in the cells, I can use the Range.Text property. This has some caveats. First, you need to AutoFit the cells or else you may get something like #### if not all the text is visible with the current cell width. Secondly, Range.Text does not work for more than one cell at a time so you would have to loop through all of the cells in the range and this can be extremely slow for large data sets.

The other method that I tried is to copy the range into the clipboard and then parse the clipboard text as a tab-separated data stream and transfer it into a multi-dimensional array. This seems to work great, although it is slower than getting Range.Value2, it is much faster for large datasets than getting Range.Text. However, I don't like the idea of using the system clipboard. If this was a really long operation that takes 60 seconds and while that operation is running, the user may decide to switch to another application and would be very unhappy to find that their clipboard either doesn't work or has mysterious data in it.

Is there a way that I can retrieve the formatted cell values to a multi-dimensional array efficiently?

I have added some sample code that is run from a couple ribbon buttons in a VSTO app. The first set some good test values and number formats and the second button will display what they look like when retrieved using one of these methods in a MessageBox.

The sample output on my system is(It could be different on yours due to Regional Settings):

Output using Range.Value
1/25/2008 3:19:32 PM    5.12345
2008-01-25 15:19:32 0.456

Output using Range.Value2
39472.6385648148    5.12345
2008-01-25 15:19:32 0.456

Output using Clipboard Copy
1/25/2008 15:19 5.12
2008-01-25 15:19:32 45.60%

Output using Range.Text and Autofit
1/25/2008 15:19 5.12
2008-01-25 15:19:32 45.60%

The Range.Text and Clipboard methods produce the correct output, but as explained above they both have problems: Range.Text is slow and Clipboard is bad practice.

    private void SetSampleValues()
    {
        var sheet = (Microsoft.Office.Interop.Excel.Worksheet) Globals.ThisAddIn.Application.ActiveSheet;

        sheet.Cells.ClearContents();
        sheet.Cells.ClearFormats();

        var range = sheet.Range["A1"];

        range.NumberFormat = "General";
        range.Value2 = "2008-01-25 15:19:32";

        range = sheet.Range["A2"];
        range.NumberFormat = "@";
        range.Value2 = "2008-01-25 15:19:32";

        range = sheet.Range["B1"];
        range.NumberFormat = "0.00";
        range.Value2 = "5.12345";

        range = sheet.Range["B2"];
        range.NumberFormat = "0.00%";
        range.Value2 = ".456";
    }

    private string ArrayToString(ref object[,] vals)
    {

        int dim1Start = vals.GetLowerBound(0); //Excel Interop will return index-1 based arrays instead of index-0 based
        int dim1End = vals.GetUpperBound(0);
        int dim2Start = vals.GetLowerBound(1);
        int dim2End = vals.GetUpperBound(1);

        var sb = new StringBuilder();
        for (int i = dim1Start; i <= dim1End; i++)
        {
            for (int j = dim2Start; j <= dim2End; j++)
            {
                sb.Append(vals[i, j]);
                if (j != dim2End)
                    sb.Append("\t");
            }
            sb.Append("\n");
        }
        return sb.ToString();
    }

    private void GetCellValues()
    {
        var sheet = (Microsoft.Office.Interop.Excel.Worksheet)Globals.ThisAddIn.Application.ActiveSheet;

        var usedRange = sheet.UsedRange;

        var sb = new StringBuilder();

        sb.Append("Output using Range.Value\n");
        var vals = (object [,]) usedRange.Value; //1-based array
        sb.Append(ArrayToString(ref vals));

        sb.Append("\nOutput using Range.Value2\n");
        vals = (object[,])usedRange.Value2; //1-based array
        sb.Append(ArrayToString(ref vals));

        sb.Append("\nOutput using Clipboard Copy\n");
        string previousClipboardText = Clipboard.GetText();
        usedRange.Copy();
        string clipboardText = Clipboard.GetText();
        Clipboard.SetText(previousClipboardText);
        vals = new object[usedRange.Rows.Count, usedRange.Columns.Count]; //0-based array
        ParseClipboard(clipboardText,ref vals);
        sb.Append(ArrayToString(ref vals));


        sb.Append("\nOutput using Range.Text and Autofit\n");
        //if you dont autofit, Range.Text may give you something like #####
        usedRange.Columns.AutoFit();
        usedRange.Rows.AutoFit();
        vals = new object[usedRange.Rows.Count, usedRange.Columns.Count];
        int startRow = usedRange.Row;
        int endRow = usedRange.Row + usedRange.Rows.Count - 1;
        int startCol = usedRange.Column;
        int endCol = usedRange.Column + usedRange.Columns.Count - 1;
        for (int r = startRow; r <= endRow; r++)
        {
            for (int c = startCol; c <= endCol; c++)
            {
                vals[r - startRow, c - startCol] = sheet.Cells[r, c].Text;
            }
        }
        sb.Append(ArrayToString(ref vals));


        MessageBox.Show(sb.ToString());
    }

    //requires reference to Microsoft.VisualBasic to get TextFieldParser
    private void ParseClipboard(string text, ref object[,] vals)
    {
        using (var tabReader = new TextFieldParser(new StringReader(text)))
        {
            tabReader.SetDelimiters("\t");
            tabReader.HasFieldsEnclosedInQuotes = true;

            int row = 0;
            while (!tabReader.EndOfData)
            {
                var fields = tabReader.ReadFields();
                for (int i = 0; i < fields.Length; i++)
                    vals[row, i] = fields[i];
                row++;
            }
        }
    }


    private void button1_Click(object sender, RibbonControlEventArgs e)
    {
        SetSampleValues();
    }

    private void button2_Click(object sender, RibbonControlEventArgs e)
    {
        GetCellValues();
    }
Mesut Akcan
  • 899
  • 7
  • 19
jjdem
  • 379
  • 5
  • 14
  • Try this: Export your formatted worksheet to a .csv file. Create a new worksheet, and **Import** (do not Open) the .csv file. When you do this, the Text Import Wizard will open and you will specify each column as being Text. You can then place the UsedRange of this new worksheet into a variant array in a single step. – Ron Rosenfeld May 01 '15 at 20:21
  • 1
    thanks but I think that method will be pretty slow and error prone since Excel isn't the most robust program for csvs. I'd never trust Excel to properly export a csv with Unicode characters, leading zero's, dates, delimiters within the cell value, newline characters within the cell value, etc. – jjdem May 01 '15 at 20:35
  • Didn't realize your requirement for Unicode characters. I just used the sample data you provided, which worked fine. – Ron Rosenfeld May 01 '15 at 20:39

2 Answers2

0

I've found a partial solution. Apply the NumberFormat value to the parsed double of Value2. This only works for single cells as returning an array for NumberFormat with different formats in the array returns System.DBNull.

double.Parse(o.Value2.ToString()).ToString(o.NumberFormat.ToString())

The dates don't work with this though. If you know which columns contains certain things, like a formatted date, you can use DateTime.FromOADate on the double and then value.ToString(format) with the NumberFormat. The code below gets close but is not complete.

<snip>
sb.Append("\nOutput using Range.Value2\n");
vals = (object[,])usedRange.Value2; //1-based array
var format = GetFormat(usedRange);
sb.Append(ArrayToString(ref vals, format));
</snip>

private static object[,] GetFormat(Microsoft.Office.Interop.Excel.Range range)
{
    var rows = range.Rows.Count;
    var cols = range.Columns.Count;
    object[,] vals = new object[rows, cols];
    for (int r = 1; r <= rows; ++r)
    {
        for (int c = 1; c <= cols; ++c)
        {
            vals[r-1, c-1] = range[r, c].NumberFormat;
        }
    }
    return vals;
}

private static string ArrayToString(ref object[,] vals, object[,] numberformat = null)
{
    int dim1Start = vals.GetLowerBound(0); //Excel Interop will return index-1 based arrays instead of index-0 based
    int dim1End = vals.GetUpperBound(0);
    int dim2Start = vals.GetLowerBound(1);
    int dim2End = vals.GetUpperBound(1);

    var sb = new StringBuilder();
    for (int i = dim1Start; i <= dim1End; i++)
   {
        for (int j = dim2Start; j <= dim2End; j++)
        {
            if (numberformat != null)
            {
                var format = numberformat[i-1, j-1].ToString();
                double v;
                if (double.TryParse(vals[i, j].ToString(), out v))
                {
                    if (format.Contains(@"/") || format.Contains(":"))
                    {// parse a date
                        var date = DateTime.FromOADate(v);
                        sb.Append(date.ToString(format));
                    }
                    else
                    {
                        sb.Append(v.ToString(format));
                    }
                }
                else
                {
                    sb.Append(vals[i, j].ToString());
                }
            }
            else
            {
                sb.Append(vals[i, j]);
            }
            if (j != dim2End)
                sb.Append("\t");
        }
        sb.Append("\n");
    }
    return sb.ToString();
}
descention
  • 136
  • 5
  • 1
    thanks, but like you said NumberFormat only works if the entire range has the same format, otherwise, you have to look at the cells one by one. This could work relatively efficiently if your data has consistent number formatting in range segments and you apply an algorithm to get parts of it (like checking if an entire column has the same NumberFormat), but if the data is relatively random, you have to get the NumberFormat for every cell and you are not going to see any performance increase over using the Text property, in fact it will be slower. – jjdem May 06 '15 at 19:00
0

One solution to your problem is to use:

Range(XYZ).Value(11) = Range(ABC).Value(11) 

As described here, this will:

Returns the recordset representation of the specified Range object in an XML format.

Assuming that your excel is configured in OpenXML format, this will copy the value/formula AND the formatting of range ABC and inject it into range XYZ.

Additionally, this answer explains the difference between Value and Value2.

.Value2 gives you the underlying value of the cell (could be empty, string, error, number (double) or boolean)

.Value gives you the same as .Value2 except if the cell was formatted as currency or date it gives you a VBA currency (which may truncate decimal places) or VBA date.

Community
  • 1
  • 1
GTPV
  • 407
  • 3
  • 5