8

I'm currently honing my python/excel skills, and have run into an issue with openpyxl.

I am trying to open a workbook, replace rows in an existing table, and save the workbook again.

Ideally, I'd like to also first be able delete all rows from the table (though retaining the table structure).

My initial workbook contains a sheet named "inputData". In this I have a table named "Data" with columns A, B, C, and 2 rows of data.

I also have a csv file named "input.csv" containing the same columns but 4 rows of data.

When I run my code, the data is written into the worksheet, but the table structure is not expanded to encompass the two new rows of data.

Any ideas of how to change the data source of a named table structure using openpyxl?

import csv
from openpyxl import load_workbook
from openpyxl.worksheet.table import Table, TableStyleInfo

wb = load_workbook(filename = 'workbook.xlsx')
ws = wb["inputData"]

with open('input.csv', newline='', encoding='utf-8-sig') as f:
    reader = csv.reader(f, delimiter=';')
    for i, row in enumerate(reader):
        if not i == 0:
            for j, cell in enumerate(row): 
                ws.cell(row=i+1, column=j+1).value = cell

wb.save('output.xlsx')
Alex Waygood
  • 6,304
  • 3
  • 24
  • 46
Henrik Poulsen
  • 935
  • 2
  • 13
  • 32
  • Can you explain better what you're trying to do? What do you mean by table structure/data source? is it `ws`? Are you trying to delete a row? add a row? The code looks like it's just changing cells - is that what you want? – kabanus Feb 07 '18 at 07:11
  • In excel you are able to enter data directly into the worksheet, or create a table structure https://support.office.com/en-us/article/overview-of-excel-tables-7ab0bb7d-3a9e-4b56-a3c9-6c94334e492c, I wish to use python to manipulate an existing table, not just add rows to a flat worksheet – Henrik Poulsen Feb 07 '18 at 07:40
  • You will need to change the structure of the table yourself. This should be possible but it is undocumented. – Charlie Clark Feb 07 '18 at 09:00
  • So you want to add rows to `ws` (for example), and then save this changed table to the original document? – kabanus Feb 07 '18 at 09:42

4 Answers4

12

I figured out the answer to my question.

I am able to access the table from openpyxl, change the ref (range) and then save it back again.

This enables me to enter more data into the same table, and have my formulas on my other worksheet take the new data into account.

This will be a very helpful feature, when I need to push a lot of data into an existing excel sheet without breaking references in the future.

import csv
from openpyxl import load_workbook
from openpyxl.worksheet.table import Table, TableStyleInfo
tableName = 'Data'

style = TableStyleInfo(name="TableStyleMedium9", showFirstColumn=False,
                       showLastColumn=False, showRowStripes=True, showColumnStripes=False)

def colnum_string(n):
    string = ""
    while n > 0:
        n, remainder = divmod(n - 1, 26)
        string = chr(65 + remainder) + string
    return string

wb = load_workbook(filename = 'workbook.xlsx')
ws = wb["inputData"]

with open('input.csv', newline='', encoding='utf-8-sig') as f:
    reader = csv.reader(f, delimiter=';')
    for i, row in enumerate(reader):
        for j, cell in enumerate(row): 
            if not i == 0:
                ws.cell(row=i+1, column=j+1).value = float(cell)
            else:
                ws.cell(row=i+1, column=j+1).value = cell

            maxRef = [i,j]

for i, table in enumerate(ws._tables):
    if table.name == tableName:
        tableRef = i

resTable = Table(displayName="Data", ref="A1:{}{}".format(colnum_string(maxRef[0]), maxRef[1]))
resTable.tableStyleInfo = style

ws._tables[tableRef] = resTable

wb.save('output.xlsx')
Henrik Poulsen
  • 935
  • 2
  • 13
  • 32
  • 3
    Thanks Henrik Poulsen. Great code! Just sharing my own experience for the benefit of others. Make sure that you're using > v.2.4.4 of the openpyxl library. Using [ws._tables] to access the "tables collection" did not work for me until I upgraded. – tabish89 Mar 12 '19 at 16:12
  • 2
    Cheers, literally the same problem I had, and nice solution. – Dean Aug 18 '20 at 15:21
5

Stumbled across this problem 2020 with openpyxl==3.0.5 Hope it is okay to share my solution as well for others who might be in the same situation.

Goal: to read in new data from data.csv and add to the existing file.xlsx, so the formulas still work. Column names stay the same.

Input:

  1. XLSX template file with formula on one sheet and data on other sheet (file.xlsx)
  2. data.csv with new data

Output: XLSX file with new data and updated table range used in formula

"""python imports"""
import openpyxl
import pandas

"""Load input workbook"""
wb = openpyxl.load_workbook(filename='file.xlsx')

"""Activate worksheet named 'data'."""
ws = wb['data']

"""Read in new data from data1.csv to pandas.dataframe"""
new_dataframe = pandas.read_csv("data1.csv")

"""Iterate over dataframe rows and write values to worksheet"""
for i, row in new_dataframe.iterrows():
    # ws.append leaves first line empty
    # to get data written from first row,  need to use writing to cell
    if i == 0:
        for c, value in enumerate(row, start=1):
            ws.cell(row=2, column=c).value = value
    else:
        current_row = [row.col1, row.col2, row.col3]
        ws.append(current_row)

"""Change table range"""
ws.tables['MyTable'].ref = "A1:E5"

"""Save workbook"""
wb.save(filename='file.xlsx')

Answer to Ethan problem: how to just change range:

# Find right table
my_table = ws.tables['Template']
# Change range
my_table.ref = ref="A7:{}{}".format(column_string(maxRef[1], maxRef[0]))
# change style
my_table.tableStyleInfo = my_style
wb.save('WorkbookName')
Varje
  • 396
  • 4
  • 8
2

First of all, thank you for this thread. I try to extend an existing table within an existing excel file (copy of a template). I simply fail to extend the table to the range i actually put data in (Remark: Some table elements contain formulas which i need to preserve)

What i do is, open the excel file, copy and paste data to the correct worksheet and the correct cells. That works as intended. What does not work is extending the range of the table which initially covers only the first row (apart from the header).

Using above code, i am able to identify the table and i tried copying the style:

for i, table in enumerate(ws._tables):
        if table.name == 'Template':
            tableRef = i
            mystyle = table.tableStyleInfo
resTable = Table(displayName="Template", ref="A7:{}{}".format(colnum_string(maxRef[1]), maxRef[0]))
resTable.tableStyleInfo = mystyle
ws._tables[tableRef] = resTable

I might overlook something here since it does not work. The table does not extend. Any help is greatly appreciated.

For better understanding of the problem:
Table header is A7:BA7
First (empty) row, some element with formulas and formatting: A8:BA8
Final data range after copying data (as example, calculation is correct): A8:BA255

Ethan
  • 51
  • 2
  • 7
0

Here a small function I wrote to quickly add data to named tables and single cell named cells.

It expects the tables to be empty and can/should be used when an Excel template report is created, saved, and later filled with data using Python

from typing import Any, Optional, Union
import pandas
import openpyxl

def find_worksheet_and_table(woorkbook, table_name):
    
    for worksheet in workbook.worksheets:
        for table in worksheet.tables.values():
            if table.name == table_name:
                return worksheet, table
    
    raise KeyError(f'Unable to find table with name "{table_name}" in workbook!')
            
def populate_table(workbook, table_name, dataframe):
    
    """Modifies table and inserts values. Formulas are kepts as is."""
    
    worksheet, table = find_worksheet_and_table(workbook, table_name)
    
    table_start, table_end = table.ref.split(':')
    table_start_row = int(''.join(x for x in table_start if x.isdigit()))
    table_start_column = column_index_from_string(''.join(x for x in table_start if not x.isdigit()))
    table_end_column_letter = ''.join(x for x in table_end if not x.isdigit())
    new_table_ref = table_start + ':' + table_end_column_letter + str(table_start_row + len(dataframe))
    table_first_cell = worksheet.cell(row=table_start_row, column=table_start_column)

    table.ref = new_table_ref

    for col_ix, column in enumerate(table.tableColumns):
        if column.name in dataframe.columns:
            for row_ix, row_value in enumerate(dataframe[column.name]):
                cell_to_fill = table_first_cell.offset(column=col_ix, row=row_ix+1)
                cell_to_fill.value = row_value
        elif column.calculatedColumnFormula:
            for ix in range(len(dataframe)):
                cell_to_fill = table_first_cell.offset(column=col_ix, row=ix+1)
                cell_to_fill.value = '='+table.tableColumns[col_ix].calculatedColumnFormula.attr_text


def populate_excel(
    input_path: Union[str, pathlib.Path],
    output_path: Union[str, pathlib.Path],
    named_cell_using_single_values: Optional[dict[str, Any]],
    table_using_dataframe : Optional[dict[str, pandas.DataFrame]],
):
    
    """
    input_path: Excel file to read.
    output_path: Save location for output Excel file.
    named_cell_using_single_values: Dict of named cells with value to populate them with.
    table_using_dataframe: Dict of named table with dataframes to populate them with.
    
    Function to insert data into Excel using named cells and named tables.
    
    """
    
    workbook = openpyxl.load_workbook(input_path)
    
    if table_using_dataframe:
        for table_name, dataframe in table_using_dataframe.items():
            populate_table(workbook, table_name, dataframe)
            
    if named_cell_using_single_values:
        
        for named_cell, value in named_cell_using_single_values.items():
            if not named_cell in workbook.defined_names:
                raise KeyError(f"Unable to find named cell {named_cell} in workbook!")
        
            worksheet_to_update, coordinate_to_update = next(workbook.defined_names[named_cell].destinations)
            cell_to_update = workbook[worksheet_to_update][coordinate_to_update]
            cell_to_update.value = value
        
    workbook.save(output_path)
    workbook.close()
        
    
        

dataframe_to_write = pandas.DataFrame(data = {'Column1': range(500), 'Column2': range(500), 'Column3': range(500)})

populate_excel(
    'input.xlsx', 
    'output.xlsx', 
    {'my_named_cell': 'value for named cell!', 'my_named_cell2': 20},
    {'MyTable': dataframe_to_write}
)
hirolau
  • 13,451
  • 8
  • 35
  • 47