9

Array operations are way faster than range operations in VSTO, so currently I'm using

object[,] RangeValues = wsh.get_Range("A1:" + lastCell.Address).Value2;

with pretty good effect. Sadly, I have some inconsistent data. Sometimes there is 0.45, and sometimes 0.45%, and of course later i see it as 0.0045 in code. Sadly, from "business" point of view, both values means 0.45. I cannot force consistency, files come from various sources I don't have any authority over. It's something I need to deal with.

The way, of course, would be to look at format, or at display text, and see if there is a % sign in it. If there is, I just need to multiply value by 100. Sadly, if I try:

object[,] RangeValues = wsh.get_Range("A1:" + lastCell.Address).Text;

I get message that cannot convert DBNull to object[,]. So is there any way that would allow me to load texts or formats all at once, without going thorough code <-> worksheet border at each step of mu loop?

Jeremy Thompson
  • 61,933
  • 36
  • 195
  • 321
Mołot
  • 699
  • 12
  • 36
  • Rather than attempt to convert the numeric value of the 0.45% to 45%, perhaps do some form of text-replace of all instances of % with blank? – Grade 'Eh' Bacon Aug 24 '15 at 13:15
  • @Grade'Eh'Bacon Yes, considered it. But that will require fast access to *texts* - exactly what my question is about. Rest is just a rationale behind my need. – Mołot Aug 24 '15 at 13:16
  • 1
    I'm not familiar with C#, but perhaps this answer helps? http://stackoverflow.com/a/6529669/5090027 – Grade 'Eh' Bacon Aug 24 '15 at 13:21
  • @Grade'Eh'Bacon not really - I'm only able to open sheets readonly. Or rather - I'm not able to be sure I can write. I can replace text all right - but only when I already have it in my memory. Offloading work to Excel would be pretty neat, but not on worksheets I'm only supposed to read. Also, whilst it can help with %, it's not universal solution, and I have a couple of similar, but different needs. – Mołot Aug 24 '15 at 13:26
  • Are the cells with the % values actually formatted as % in excel? If not, how are they formatted? Did the end-user actually type in "%" to the cell ? – Tim Williams Aug 24 '15 at 17:55
  • Why don't you loop through the array you've got and use something like `If(rv[i] < 0.01 && rv[i] > 0) {rv[i] = rv[i] * 100}` – SierraOscar Aug 26 '15 at 22:35
  • @MacroMan because 0.005 is pretty likely value and does not necessarily is meant to mean 0.5 - it represents point worth of items, and some items are there not for their value, but to fulfill other requirements. We call them 0-value items, but their real value is almost never a true zero. By the way, I tried similar approach, but ultimately it failed, too many false positives or negatives, depending on where I put the border. – Mołot Aug 27 '15 at 07:26
  • What about `String.Replace` and `Double.TryParse` ? --- `String temp = RangeValues[i].Value2.ToString(); temp.Replace("%", ""); RangeValues[i].Value2 = Double.TryParse(temp);` – SierraOscar Aug 27 '15 at 08:35
  • @MacroMan But now I'm getting doubles, not strings. That's the very point of this question. – Mołot Aug 27 '15 at 08:39
  • Coming a little late in the game, but what about you verifying the formatting of only these "false positives" you mentioned when filtering values below 0.005? That should already cut down a lot of the Interop calls. – Ama Jan 02 '20 at 19:14

3 Answers3

6

Detecting Excel Cell Formats

To find the format of cells use Excel's Cell("format",A1) function rather that interrogating the datatypes which would be much slower, harder and prone to problems, eg: 0.45% != 45%.

enter image description here

private void button1_Click(object sender, EventArgs e) 
{
    // evaluate the Format of Cells A1 thru to A7
    using (var rnEvaluate = xlApp.Range["C1:C1"].WithComCleanup())
    {
        for (int i = 1; i < 8; i++)
        {
            rnEvaluate.Resource.Value2 = "=CELL(\"format\",A" + i.ToString() + ")";
            string cellFormat = GetExcelCellFormat(rnEvaluate.Resource.Value2);
            System.Diagnostics.Debug.Write(cellFormat);
        }
    } 
}

private string GetExcelCellFormat(string cellFormat = "G") 
{
    switch (cellFormat.Substring(0, 1))
    {
        case "F" :
            return "Number";
            break;
        case "P" :
            return "Percentage";
            break;
        case "C":
            return "Currency";
            break;
        case "D":
            return "Date";
            break;
        default :
            return "General";
            break;
    } 
}

The .WithComCleanup() is because I am using VSTO Contrib.


Detecting All Excel Cell Formats at once

Is there any way that would allow me to load texts or formats all at once?

Simply use the above method to detect all the cell formats (using AutoFill) and add them to a objectArray. Say for example I wanted to know the cell formats for columns A & B:

enter image description here

Using this VBA code I could to get all cell formats (at once without iterating over cells):

Range("C1").Select
ActiveCell.Value2 = "=CELL(""format"",A1)"
'Fill Down
Range("C1").Select
Selection.AutoFill Destination:=Range("C1:C6"), Type:=xlFillDefault
'Fill Across
Range("C1:C6").Select
Selection.AutoFill Destination:=Range("C1:D6"), Type:=xlFillDefault

Here is the above VBA code converted to C# and storing the formats in an object array:

var filepath = @"C:\temp\test\book2.xlsx";
var xlApp = new Microsoft.Office.Interop.Excel.Application();
//Optional but recommended if the user shouldn't see Excel.
xlApp.Visible = false;
xlApp.ScreenUpdating = false;
//AddToMru parameter is optional, but recommended in automation scenarios.
var workbook = xlApp.Workbooks.Open(filepath, AddToMru: false);

//This operation may take a little bit of time but no where near 15 minutes!!!
var cell = xlApp.Range["C1:C1"];
cell.Value2 = "=CELL(\"format\",A1)";
//Fill Down
cell.AutoFill(xlApp.Range["C1:C6"], Microsoft.Office.Interop.Excel.XlAutoFillType.xlFillDefault);
//Fill Across
cell = xlApp.Range["C1:C6"];
cell.AutoFill(xlApp.Range["C1:D6"], Microsoft.Office.Interop.Excel.XlAutoFillType.xlFillDefault);
//Get cell formats into object array
object[,] rangeFormats = xlApp.get_Range("C1:D6").Value2;

The Excel Percentage Conversion Trick

I have some inconsistent data. Sometimes there is 0.45, and sometimes 0.45%

If the only data inconsistencies you're having is with % values then here is a trick.

Presumably the percentage values will be in a column, to convert them, copy the column of values (in column A):

enter image description here

Make sure you set a column with values of 100 (as shown in column B)

Right click the cell in the column of 100's and choose Paste Special:

enter image description here

Choose Values and Multiply:

enter image description here

Excel will convert them to real numbers:

enter image description here

Obviously you can do this programmatically. Simply record the operation as a Macro and convert the VBA to C#.

and of course later i see it as 0.0045 in code.

Note: The code is right, 0.45% is not 45%, 0.45% is less than half a percent! If a particular customer is sending you files expecting you to break the laws of mathematics and treat 0.45% = 45% then there is a good chance they may suddenly start getting 100x more or 100x less. I'd politely point out they need to change it. Don't try and program around this. If this is why you want to look at Cell Formats then all you're doing is troubleshooting a symptom rather than fixing the root cause which will exacerbate the issue and hide a much bigger problem. Just politely point it out to the sources you have no control over there could be some massive problems by x100 fold and insist it needs to be corrected. Otherwise I expect to see a hilarious story about it in DailyWTF featuring this code:

var val = rangeValues[1,1].ToString();
var cellFormat = rangeFormat[1,1].ToString();

if (val.EndsWith("%") && val.Replace("%","") < 1 && cellFormat == "G")  {
   dailyWTFval = val.Replace("%","") * 100;
}
else
   dailyWTFval = val;    
}
Jeremy Thompson
  • 61,933
  • 36
  • 195
  • 321
  • The point is, % is just a symbol of unit that in this particular industry does not mean mathematical percents. It's unfortunate, but writing unit name in header is just as proper as writing it's symbol after values. And to be honest, when everyone and their dog knows what unit is it, omitting both name and symbol also happens and nobody is going to care. Now, I can't go to company X (one of like, 3 biggest in the country in their business) and tell them to adapt a way of their competitor (other company from that 3). Not my level. If you can, congratulations, for me it is not going to happen. – Mołot Aug 29 '15 at 10:40
  • % is a symbol in a particular industry that doesn't mean math percentages, ok that changes everything. Ignore my comments about the % trick:) – Jeremy Thompson Aug 29 '15 at 10:45
  • to be honest I'm quite irritated at them about that. But there is simply nothing I can do about it. And unit is derived from certain percentages, it was just redefined so much that treating 50 units as 0.5 no longer have any sense. Except for Excel it does when someone writes 50%. – Mołot Aug 29 '15 at 10:49
  • Did you need any further help? I did put in quite a lot of time and effort. If my answer helped you and you prefer this way over the XML parsing method (*which you'd have to write to code to perform*) I would feel honored to receive the bounty. – Jeremy Thompson Aug 31 '15 at 12:22
  • Thanks, nothing else needed. It took some time to actually try all this stuff, that's all :) And yes, this was most direct approach from all answers. – Mołot Aug 31 '15 at 12:33
3

I think the easiest way to read large amounts of non harmonized data from excel will be following

  1. In C# save excel file in XML Spreadsheet 2003(*xml). That will produce xml file with all data and styling. C# method for saving - Workbook.SaveAs with FileFormat value = XlFileFormat.xlXMLSpreadsheet

  2. Parse xml file and extract data with formats

  3. Delete temporary file

Examle: Excel Excel

XML

<?xml version="1.0"?>
<?mso-application progid="Excel.Sheet"?>
<Workbook xmlns="urn:schemas-microsoft-com:office:spreadsheet"
 xmlns:o="urn:schemas-microsoft-com:office:office"
 xmlns:x="urn:schemas-microsoft-com:office:excel"
 xmlns:ss="urn:schemas-microsoft-com:office:spreadsheet"
 xmlns:html="http://www.w3.org/TR/REC-html40">
  <Styles>
    <Style ss:ID="Default" ss:Name="Normal">
      <Alignment ss:Vertical="Bottom"/>
      <Borders/>
      <Font ss:FontName="Calibri" x:CharSet="204" x:Family="Swiss" ss:Size="11"
       ss:Color="#000000"/>
      <Interior/>
      <NumberFormat/>
      <Protection/>
    </Style>
    <Style ss:ID="s62">
      <NumberFormat ss:Format="0%"/>
    </Style>
  </Styles>
  <Worksheet ss:Name="Sheet1">
    <Table ss:ExpandedColumnCount="1" ss:ExpandedRowCount="5" x:FullColumns="1"
     x:FullRows="1" ss:DefaultRowHeight="15">
      <Row ss:AutoFitHeight="0">
        <Cell>
          <Data ss:Type="String">Data</Data>
        </Cell>
      </Row>
      <Row ss:AutoFitHeight="0">
        <Cell>
          <Data ss:Type="Number">45</Data>
        </Cell>
      </Row>
      <Row ss:AutoFitHeight="0">
        <Cell ss:StyleID="s62">
          <Data ss:Type="Number">0.45</Data>
        </Cell>
      </Row>
      <Row ss:AutoFitHeight="0">
        <Cell>
          <Data ss:Type="String">String</Data>
        </Cell>
      </Row>
      <Row ss:AutoFitHeight="0">
        <Cell>
          <Data ss:Type="Number">45.5</Data>
        </Cell>
      </Row>
    </Table>
  </Worksheet>

I removed some nodes for simplicity. Following elements are required to be analyzed to correctly extract data

  1. Workbook\Worksheet\Table\Row\Cell\Data - contains data formated to invariant culture
  2. Workbook\Worksheet\Table\Row\Cell\Data, attribute ss:Type - contains data type of Data element contents
  3. Workbook\Worksheet\Table\Row\Cell, attribute ss:StyleID - reference to style, for your case it is only required to correctly identify cells where number is formated as percentage (multiplication to 100 required)
  4. Workbook\Styles\Style, attribute ss:ID - id of style used to reference style from cells
  5. Workbook\Styles\Style\NumberFormat, attrubute ss:Format - if ends with % and data type Number -> it is percentage

Parser logic:

  1. String type in cell -> convert as is
  2. Number type in cell. If format ends with '%' -> multiply by 100, otherwise use as is.

If data volumes are not so big(less then 200-300), it is possible to do that analysis cell by cell from C# without saving file in xml format.

Timur Mannapov
  • 217
  • 2
  • 9
-1

This may be a poor way to do it, but consider rewriting your excel file to a new excel file, converting to Text format in the process. I suggest this only because you may not have write-privileges in the original file, per your comments to your question. See an example of how you could convert the entire file [excluding the new file creation] here: https://stackoverflow.com/a/25162945/5090027

Alternatively, MSDN discusses a method of exporting the entire file as a text file here, using Spire [Disclosure - I have not used this myself]: https://code.msdn.microsoft.com/windowsdesktop/Export-Excel-Data-to-Text-015bc013

Community
  • 1
  • 1
Grade 'Eh' Bacon
  • 3,773
  • 4
  • 24
  • 46