How to use openpyxl to add a new column to an existing excel table

Question

I have a template spreadsheet with eixisting tables and formula referring to data in the tables.

As a template, the tables start with one row of empty data.

I wish to add data (rows and columns) to the tables but retain working formula.

There is an excellent post here describing how to add rows to en existing table. This is working in my code.

A second post here attempts to add columns to a table by replacing the table with a new one. Unfortunately this breaks the formula.

Here's a snipit of my code:

     sSheetName = 'test'
     workbook = openpyxl.load_workbook(excelDumpFilename)
     sheet=workbook[sSheetName]
     dfCSVData = pd.read_csv(os.path.join(sCSVLocation,sCSVFile))
     sheetRow = 2
     # loop around each row in the CSV data
     for CSVrow in dataframe_to_rows(dfCSVData, header=False, index=False):
         sheetCol = 2
         sheet.insert_rows(sheetRow, 1)
         # loop around each value of the data row
         for CSVvalue in CSVrow:
             sheet.cell(row=sheetRow, column=sheetCol).value=CSVvalue
             sheetCol+=1
         sheetRow+=1
     table    = sheet.tables[sSheetName]
     currentTitleCount = len(table.column_names) 
     tableRows = dfCSVData.shape[0]
     tableCols = dfCSVData.shape[1]
     # Update table titles for non-generic fields
     for index in range(currentTitleCount, tableCols):
         sheet.cell(row=2, column=tableColStart+index).value=dfCSVData.columns[index]
         table.column_names.append( dfCSVData.columns[index] )
     newRange=openpyxl.worksheet.cell_range.CellRange(
         min_row=tableRowStart,
         min_col=tableColStart,
         max_row=tableRowStart + tableRows,
         max_col=tableColStart + tableCols - 1
         #max_col=tableColStart + currentTitleCount - 1
     )
     tTable.ref=newRange.coord
     tTable.autoFilter.ref=newRange.coord
 workbook.save(excelDumpFilename)

dfCSVData contains my new table data but the dimensions are larger than the table in the template sheet.

You can see that I update the template sheet with extended column names.

I also attempt to alter the table column_names directly: table.column_names.append( dfCSVData.columns[index] ) but this fails to make any changes (examined under debug).

Using tableRows extends the length of my table. This is working.

Using tableCols (a larger number) corrupts the Excel file and Excel removes the table entirely the next time you open the sheet.

Help? Thanks

It would be easier if you could provide a more complete example, eg. a sample template and how you want to extend it. — Charlie Clark, Aug 09 '23 at 08:10
Actually, you can edit the table, you just need to keep it in sync with the cells. So, along with the ref, the tableColumns need updating. — Charlie Clark, Aug 09 '23 at 16:10
Thanks @CharlieClark. If you see the following code, for some reason the append statement simply refuses to add a column name to the list: table = sheet.tables[sSheetName] ... for index in range(currentTitleCount, tableCols): sheet.cell(row=2, column=tableColStart+index).value = dfCSVData.columns[index] table.column_names.append( dfCSVData.columns[index] ) ... ) table.ref=newRange.coord table.autoFilter.ref=newRange.coord — Richard, Aug 10 '23 at 10:19
Please update your question to include the code and, if possible, make the openpyxl code as clear and independent of anything else. — Charlie Clark, Aug 10 '23 at 11:20
Thanks for helping with this @CharlieClark. I updated the question with a little more code. Difficult to provide more as the data is proprietary. — Richard, Aug 11 '23 at 14:40

score 0 · Accepted Answer · answered Aug 11 '23 at 17:06

0

Like I said in my comment, you should really simplify the example to, say, increase a table from A1:D4 to A1:F6 and provide some sample data. This makes it a lot easier for people to help.

table.column_names is a property that just returns a list of column names. That's why changing this has no effect. You need to update the table.TableColumns. As you are updating the worksheet you have to choices. Firstly, use the private method table._initialise_columns(), or do this manually by editing or extending table.TableColumns. It's worth looking at the openpyxl source to see how this works.

Unfortunately, it's essential that table.TableColumns exactly matches the worksheet. This makes it trickier in openpyxl to edit the table because the table isn't really a worksheet object (this is how it's implemented in OOXML).

answered Aug 11 '23 at 17:06

Charlie Clark

18,477
4
49
55

Thank you so much for the guidance in using StackOverflow and also for the fix to this problem. the underlying issue was partly knowing of this other variable in the Table and also examining the Excel XML files to uncover the precise nature of the problem. I also found that the column IDs in the template Excel spreadsheet were in an unexpected order (1, 2, 6, 3, 4). I added my new columns from ID 6 and so created a duplicate in the spreadsheet. – Richard Aug 15 '23 at 08:30
To help others debug Excel, know that an Excel file is simply a ZIP file. You can open the XLSX file in a ZIP application and examine the underlying XML files that make up the data. Excellent to debug, but dangerous to modify! – Richard Aug 15 '23 at 08:33

How to use openpyxl to add a new column to an existing excel table

1 Answers1