4

I'm automating an Excel report batch-creation creation process. Excel files are used as report templates, and placeholders like ^SOME_DATA^ are replaced via a Python script which unzips the .XLSX file, performs a basic find and replace inside sharedStrings.xml and sheet1.xml files, and then re-zips finished file. This part works (finally).

When opening the new file in Excel, however, the numbers that replaced the placeholders all are flagged as "Number Stored As Text" by Excel (which causes both formulas and formatting in the worksheet to fail, of course). Most of the placeholders were stored by Excel in sharedStrings.xml, and given the "strings" part of that I'm not surprised they're still treated as strings.

So far...

  • I tried casting with =VALUE(A1) (etc.) to convert the text-numbers to actual numbers... this works interactively, but fails when the file is created programmatically: The formula cell shows as #VALUE! (Popup: "Error in Value") unless you edit it and hit enter.

  • I prefer Python as it fits with the rest of the automation, but could redo the find/replace/save code in VBA to "get it out the door" in a less-automated state. I've not had much luck with VBA solutions being reliable/robust, personally, however.

  • openpyxl works (reasonably) well for reading OR writing Excel, but doesn't appear to support making edits to an existing formatted file.

Karl U
  • 41
  • 1
  • 3
  • I've been working on an openpyxl branch to read in and preserve formatting (pretty much for the same purpose as you're doing) - final testing isn't done, but if you'd like to test with openpyxl preserving formatting, you can download it from http://bitbucket.org/amorris/openpyxl/overview – Adam Morris Apr 29 '12 at 03:10

3 Answers3

1

I am no expert in Open XML but I believe you can solve this problem by writing directly the numerical value to the sheet's xml file (sheet1.xml). You will first have to search the sheet's xml file to find cells that contain strings. Those cells have a t="s" attribute. The value of the v node is the index of the referenced string in the sharedStrings.xml file.

    <row r="8" spans="1:6" x14ac:dyDescent="0.25">
        <c r="F8" t="s">
            <v>2</v>
        </c>
    </row>

You can check if the cell references the placeholder string (the index of ^SOME_DATA^ in sharedStrings.xml). If it does, then you can replace the string index with a numerical value and remove the t attribute:

        <c r="F8" t="s">
            <v>2</v>
        </c>

becomes

        <c r="F8">
            <v>812397568</v>
        </c>

I hope this helps.

bouvierr
  • 3,563
  • 3
  • 27
  • 32
1

The proper construct is probably this:

<c ... t="inlineStr"><is>Inline text</is>

This …

<c ... t="str">Inline text</c>

… is for formulas.

See "What's the difference between <c t="str"> and <c><is> in Office Open XML?".

Community
  • 1
  • 1
Laurent CAPRANI
  • 139
  • 2
  • 10
0

An alternative approach to templating the sharedStrings table is to template the strings/numbers in the cells themselves.

For example here is a shared string entry created by Excel:

<!-- Value 0 refers to a string in sharedStrings table -->
<c r="A1" t="s"> 
    <v>0</v> 
</c>

Replace this with an inlined string:

<!-- Replace the type with str for an inlined string -->
<c r="A1" t="str"> 
    <v>Some new text here</v> 
</c>

Or a number:

<!-- Drop the t type for numbers -->
<c r="A1"> 
    <v>12345</v> 
</c>

For more detail on this see Write Simpler SpreadsheetML when Generating Spreadsheets at OpenXML Developer.

Note, strictly speaking this isn't an inlined string since that has a specific meaning in the standard, a different t type and a slighty more complex structure. This string t ="str" is actually the string part of a formula. However, it is a handy workaround.

jmcnamara
  • 38,196
  • 6
  • 90
  • 108