3

My goal is to use enter this formula into Excel using Openpyxl.

=CORREL(IF(A2:A7;A2:A7);B2:B7)

The reason for the IF formula is to ignore the cells that have 0 inside, which are not ignored by the CORREL function by default.

When inserting this formula into excel the output is actually:

=CORREL(IF(@A2:A7;A2:A7);B2:B7)

This formula still works if the formula is entered in a cell which is on the same row of the range that is given (row 2-7 here). But gives #VALUE! when put on another row. Removing the "@" manually resolves the issue, but is not feasible because I want to use this for automating analyses.

I found out that the "@" is added for arrays in some cases by excel and replaces to "[#This Row]" from pre-2010 versions of Excel, which explains why the formula works when entered on the same row. The "@" symbol is also added to a formula by openpyxl when the formula is not recognized. However, both of these links do not explain why "@" is added in my example.

Is this a side-effect of using ranges in an IF function? Is this a bug(/feature) from parsing a range into an IF function?

Here is the code I used to replicate the error. My Openpyxl version is 3.0.5, I use Excel from Microsoft 365.

from openpyxl import Workbook
wb = Workbook()
ws = wb.active
rows = [
    ['Size', 'Batch 1', 'Batch 2'],
    [2, 40, 30],
    [0, 40, 25],
    [0, 50, 30],
    [5, 30, 25],
    [6, 25, 35],
    [7, 20, 40],
]

for row in rows:
    ws.append(row)

formula = '=CORREL(IF(A2:A7,A2:A7),B2:B7)'
ws.cell(row=5, column=5, value=formula) # Formula in row 5 works
ws.cell(row=9, column=5, value=formula) # Formula in row 9 returns #VALUE!
formula = '=CORREL(A2:A7,B2:B7)'
ws.cell(row=6, column=5, value=formula) # CORREL function with ranges works fine
formula = '=IF(A2:A3,A2:A3)'
ws.cell(row=7, column=5, value=formula) # enters "=@IF(@A2:A3;A2:A3)"
formula = '=IF(A2,B2)'
ws.cell(row=8, column=5, value=formula) # enters "=IF(A2;B2)"

wb.save("test.xlsx")

In 'test.xlsx' this is my output (cell E5-E9)

-0.9528
-0.9528
#VALUE!
40
#VALUE!

And the formulae:

=CORREL(IF(@A2:A7;A2:A7);B2:B7)
=CORREL(A2:A7;B2:B7)
=@IF(@A2:A3;A2:A3)
=IF(A2;B2)
=CORREL(IF(@A2:A7;A2:A7);B2:B7)
Snery
  • 113
  • 7
  • It's not a bug, it's actually a feature and indeed a side-effect of using dynamic arrays. I think it's safe to say that the reason you are getting the `"@"` is related to you using Microsoft365 as described [here](https://stackoverflow.com/q/61138029/9758194). Though since Python is not my forte I would have no clue for you on how to solve this through OpenPyxl library. You'd somehow need to enter array formulae. – JvdV Feb 02 '21 at 11:12

3 Answers3

4

Answering my own question here. Thanks to JvdV for putting me on the right path.

In this answer I found what solved my question. I added this line before saving the excel file in my example code:

ws.formula_attributes['E9'] = {'t': 'array', 'ref': "E9:E9"}

This essentially sets the formula in cell 'E9' to be read as an array formula, which is shown in Excel by having the formula between curly braces ({}). In older versions of Excel this is done by, pressing ctrl-shift-enter when entering the formula. Note that writing these curly braces in the formula string does not work.

If anyone knows how the 'ref' part works, feel free add this in the comments.

Snery
  • 113
  • 7
  • 1
    Note the API for this will change in 3.1 with ArrayFormulae becoming objects you assign directly to cells. – Charlie Clark Feb 02 '21 at 14:33
  • I get an @ added when not using formula_attributes, and get curly brackets added when using formula_attributes. Is there any way to just get the "clean" formula? – Henri Jul 07 '21 at 06:39
  • I am not an expert on Excel, but if I understand correctly, the curly brackets are just a way to visualize that the cell is interpreted as an array formula. If I click into the cell in Excel the brackets disappear and they do not return afterwards. So perhaps you can try to write out each cell that you created using the 'formula_attributes' into a new sheet, just copy paste all content of each individual cell so to say. Otherwise you'll have to be more specific on why you need a clean formula. Do the curly brackets appear somewhere where you don't want them? – Snery Jul 08 '21 at 08:08
0

Through a lot of trial and error, I've found a way to enter a simple formula into a cell using openpyxl. I did this by examining the XML of a spreadsheet I saved from Excel for Mac. A .xlsx document is just a zip file. I found these files in mine:

docProps/app.xml
docProps/core.xml
xl/theme/theme1.xml
xl/worksheets/sheet1.xml
xl/styles.xml
_rels/.rels
xl/workbook.xml
xl/_rels/workbook.xml.rels
[Content_Types].xml

I opened the "xl/worksheets/sheet1.xml" file and looked at the raw XML for a formula that was displaying correctly before I saved. I found this in the relevant cell:

<f ca="1">_xlfn.DAYS(C2,TODAY())</f>

The use of "_xlfn." is mentioned in the openpyxl docs.

So with openpyxl I used this code to correctly enter my desired formula in a newly created workbook:

#!/usr/bin/env python3

import openpyxl
import csv
import os

fileCSV  = os.path.expanduser('~/doc/SSL_Cert_Expiry.csv')
fileXLSX = os.path.expanduser('~/doc/SSL_Cert_Expiry.xlsx')

print(f'Using\n\tCSV {fileCSV}\n\tXLSX {fileXLSX}')

wb = openpyxl.Workbook()
ws = wb.active

with open(fileCSV, newline='') as fIn:
  sslReader = csv.reader(fIn, delimiter=',')
  for idx, row in enumerate(sslReader):
    row.append(f'=_xlfn.DAYS(C{idx + 1},TODAY())')
    ws.append(row)

wb.save(fileXLSX)

I opened this in Excel for Mac and my formulae displayed correctly, not the dreaded !#NAME.

Footnote: I was expecting to have to set the 'ca' attribute for my formula cells, but it wasn't necessary in the end. It could be done with eg.

ws.formula_attributes['D2'] = {'ca': '1'}

The 'ref' attribute is not needed for cells that are not displaying an array formula. This can be gleaned from this page and its neighbours.

Dicky G
  • 41
  • 3
0

Additional answer for anyone using the WriteOnlyWorksheet in optimized mode. You need to explicitly create an instance of ArrayFormula and set that as the cell value:

b2 = WriteOnlyCell(ws, ArrayFormula("B2:B2", "=SUMPRODUCT(...)"))

You need to supply a range as first argument (of which only the first cell is used) so just pass the cell which you are writing to (which is a bit tricky to get in WriteOnlyMode, so you need to use your own tracking to determine that).

andyhasit
  • 14,137
  • 7
  • 49
  • 51