In python removing rows from a excel file using xlrd, xlwt, and xlutils

Question

Hello everyone and thank you in advance.

I have a python script where I am opening a template excel file, adding data (while preserving the style) and saving again. I would like to be able to remove rows that I did not edit before saving out the new xls file. My template xls file has a footer so I want to delete the extra rows before the footer.

Here is how I am loading the xls template:

self.inBook = xlrd.open_workbook(file_path, formatting_info=True)
self.outBook = xlutils.copy.copy(self.inBook)
self.outBookCopy = xlutils.copy.copy(self.inBook)

I then write the info to outBook while grabbing the style from outBookCopy and applying it to each row that I modify in outbook.

so how do I delete rows from outBook before writing it? Thanks everyone!

score 3 · Answer 1 · answered Jan 06 '16 at 05:53

I achieved using Pandas package....

import pandas as pd

#Read from Excel
xl= pd.ExcelFile("test.xls")

#Parsing Excel Sheet to DataFrame
dfs = xl.parse(xl.sheet_names[0])

#Update DataFrame as per requirement
#(Here Removing the row from DataFrame having blank value in "Name" column)

dfs = dfs[dfs['Name'] != '']

#Updating the excel sheet with the updated DataFrame

dfs.to_excel("test.xls",sheet_name='Sheet1',index=False)

Peter DeGlopper · Answer 2 · 2021-12-30T00:35:21.273

xlwt does not provide a simple interface for doing this, but I've had success with a somewhat similar problem (inserting multiple copies of a row into a copied workbook) by directly changing the worksheet's rows attribute and the row numbers on the row and cell objects.

The rows attribute is a dict, indexed on row number, so iterating a row range takes a little care and you can't slice it.

Given the number of rows you want to delete and the initial row number of the first row you want to keep, something like this might work:

rows_indices_to_move = range(first_kept_row, worksheet.last_used_row + 1)
max_used_row = 0
for row_index in rows_indices_to_move:
    new_row_number = row_index - number_to_delete
    if row_index in worksheet.rows():
        row = worksheet.rows[row_index]
        row._Row__idx = new_row_number
        for cell in row._Row__cells.values():
            if cell:
                cell.rowx = new_row_number
        worksheet.rows[new_row_number] = row
        max_used_row = new_row_number
    else:
        # There's no row in the block we're trying to slide up at this index, but there might be a row already present to clear out.
        if new_row_number in worksheet.rows():
            del worksheet.rows[new_row_number]
# now delete any remaining rows
del worksheet.rows[new_row_number + 1:]
# and update the internal marker for the last remaining row
if max_used_row:
    worksheet.last_used_row = max_used_row

I would believe that there are bugs in that code, it's untested and relies on direct manipulation of the underlying data structures, but it should show the general idea. Modify the row and cell objects and adjust the rows dictionary so that the indices are correct.

Do you have merged ranges in the rows you want to delete, or below them? If so you'll also need to run through the worksheet's merged_ranges attribute and update the rows for them. Also, if you have multiple groups of rows to delete you'll need to adjust this answer - this is specific to the case of having a block of rows to delete and shifting everything below up.

As a side note - I was able to write text to my worksheet and preserve the predefined style thus:

def write_with_style(ws, row, col, value):
    if ws.rows[row]._Row__cells[col]:
        old_xf_idx = ws.rows[row]._Row__cells[col].xf_idx
        ws.write(row, col, value)
        ws.rows[row]._Row__cells[col].xf_idx = old_xf_idx
    else:
        ws.write(row, col, value)

That might let you skip having two copies of your spreadsheet open at once.

The first line of your code throws: "TypeError: unhashable type: 'slice'" where: w_sheet.rows[1] — Rivered, Dec 28 '21 at 16:14
@Rivered - fair point! The rows structure is a dict rather than an array, just indexing on integer row numbers. I'll update, though I don't have a convenient test harness around to ensure that it's correct. This kind of unsupported API access is always risky. — Peter DeGlopper, Dec 30 '21 at 00:01

score 0 · Answer 3 · answered Dec 01 '20 at 17:58

For those of us still stuck with xlrd/xlwt/xlutils, here's a filter you could use:

from xlutils.filter import BaseFilter

class RowFilter(BaseFilter):
    rows_to_exclude: "Iterable[int]"
    _next_output_row: int

    def __init__(
            self,
            rows_to_exclude: "Iterable[int]",
    ):
        self.rows_to_exclude = rows_to_exclude
        self._next_output_row = -1

    def _should_include_row(self, rdrowx):
        return rdrowx not in self.rows_to_exclude

    def row(self, rdrowx, wtrowx):
        if self._should_include_row(rdrowx):
            # Proceed with writing out the row to the output file
            self._next_output_row += 1
            self.next.row(
                rdrowx, self._next_output_row,
            )

    # After `row()` has been called, `cell()` is called for each cell of the row
    def cell(self, rdrowx, rdcolx, wtrowx, wtcolx):
        if self._should_include_row(rdrowx):
            self.next.cell(
                rdrowx, rdcolx, self._next_output_row, wtcolx,
            )

Then put it to use with e.g.:

from xlrd import open_workbook
from xlutils.filter import DirectoryWriter, XLRDReader

xlutils.filter.process(
    XLRDReader(open_workbook("input_filename.xls", "output_filename.xls")),
    RowFilter([3, 4, 5]),
    DirectoryWriter("output_dir"),
)

In python removing rows from a excel file using xlrd, xlwt, and xlutils

3 Answers3

Linked