0

I have a column that can contain various data types such as Dates, Strings, Numbers, etc... for instance:

Example input column

What I need to do is to display this contact on a separate dialog window. The problem is that when I use cell.Value2 I get double instead of the visible string. and when I try to use cell.Value and after that check if cell.Value.GetType().Name == "DateTime" the NumberFormat string doesn't match the C# formatting; for instance, the NumberFormat I get for the 5th line in the image is m/d/yy;@ which is useless in C# because cell.Value.ToString(formatting) will result in 0/5/20;@.

Another problem, the cell that contains time returns 0.43888 in both cell.Value and cell.Value2 the only way I can see something that can be used is with: cell.Value(XlRangeValueDataType.xlRangeValueMSPersistXML) or cell.Value(XlRangeValueDataType.xlRangeValueXMLSpreadsheet); which are time-consuming even before parsing them. Since sometime I need to read hundreds of thousands of values I cant use the XML approach.

Related links: Get Formatted Cell Values efficiently How do I check type of dynamic datatype at runtime? https://social.msdn.microsoft.com/Forums/office/en-US/32163853-7e27-4af7-b95b-f4b53118a594/c-excel-date-issue?forum=vsto

My Question: What is the fastest and most robust way to get the presented values of excel cells into C# objects - without any change to the formatting?

yossico
  • 3,421
  • 5
  • 41
  • 76
  • In VBA, Excel Range objects have a .Text property which is intended to return the displayed cell's value as a string. Maybe you can access this via C#? – DS_London Oct 18 '20 at 14:50
  • @DS_London thanks! This does solve the formatting issue but it is very slow in the performance aspect, I used to get up to the full column of values via the ".Value" / ".Value2" fields (as array) but the ".Text" doesn't work on more than one cell and I need to get the column cell by cell and this is very slow. any suggestions? – yossico Oct 19 '20 at 11:39
  • I'm afraid not. The reason that Value works so fast is that (I think) it directly accesses the underlying array that Excel uses to store the variant data. Excel doesn't store the actual text that is displayed ... just the means of formatting it, and because each cell in the range can have different formatting you have to step through cell-by-cell. Also the text presented to one user may differ to that presented to another (eg dates in different languages): but the underlying value doesn't change. Excel saves filespace by not storing formats for every single cell, but only the value. – DS_London Oct 19 '20 at 11:53
  • I suppose if the data on the sheet had any kind of repeatable structure (eg Name, Address, Date Of Birth, Phone number) then you could get the number formats of each data item, then apply as a template. That might save some time. But if the values really are free-form then I think you have to go cell-by-cell. You would probably need some kind of Excel->Sprintf() format string converter still. – DS_London Oct 19 '20 at 12:02
  • @DS_London Thanks, sound reasonable, you can put it as an answer so I can accept it – yossico Oct 19 '20 at 13:15

2 Answers2

1

Excel stores the data (values) of cells separately from their display formats. While the value of a cell (typically a variant) won't change, it may be displayed differently to different users (eg a Date with "Short Date" formatting will appear differently to a user in the UK than to one in the US, even though the underlying value (a Double) hasn't changed; similarly different locales use different thousands separators). You can extract the displayed item using the Range.Text property, and it returns a string.

Whereas you can retrieve a 2-D array of a Range's value using Range.Value, you can't do the same for the displayed values, as Excel doesn't store them: instead it uses the NumberFormat to determine what to display. Using .Value is usually very efficient as (I think) you are accessing the underlying data directly: there is not a similar access method for the formats. Even if you could access the NumberFormat, as you have found it doesn't match sprintf() or other formatting schemas. Also, there may be some optimization to save on filesize: eg if every cell in a column has the same format, Excel may only save that format once for the whole column (I am open to correction on this).

For a completely free-form Excel sheet with unknown combinations of values and formats, I think you are forced to go through the range cell-by-cell, which is painfully slow (especially if you are crossing the process boundary and using late-binding).

If you could identify a pattern to the data (eg a record card with Name, DOB, Address, Phone etc) then you could create a format template from the first record, and then apply it to all the subsequent records. This might speed things up.

DS_London
  • 3,644
  • 1
  • 7
  • 24
1

You can give it try to OpenXml Sdk for Excel. Since it read data from xml, it is much faster than Com-API.

YUVRAJ
  • 109
  • 3