8

After spending the last few hours trying to find a way to do this, I've decided to just ask.

I've gone through the openpyxl docs more than a few times, as well as going through the questions asked here and here, and even the chapter from this online book, and none of it really answers what I'm attempting to do.

Here's the code I have right now:

for row in ws.iter_rows():
    i = 1
    if row[i].internal_value() == None:
        ws.Rows(i).Delete()
    else:
        i + 1

I've tried many different things with this, and right now I'm getting an error:

TypeError: 'NoneType' object is not callable

What am I doing wrong, and how can I fix it so that I'm iterating over all of the rows and deleting any that either are completely empty, or (if it's easier to implement) have an empty first cell?

Thanks

Community
  • 1
  • 1
DJGrandpaJ
  • 571
  • 3
  • 7
  • 20

9 Answers9

9

May be for someone next code will be useful:

index_row = []

# loop each row in column A
for i in range(1, ws.max_row):
    # define emptiness of cell
    if ws.cell(i, 1).value is None:
        # collect indexes of rows
        index_row.append(i)

# loop each index value
for row_del in range(len(index_row)):
    ws.delete_rows(idx=index_row[row_del], amount=1)
    # exclude offset of rows through each iteration
    index_row = list(map(lambda k: k - 1, index_row))
7

As far as I know openpyxl provides no way to delete rows. You may use COM instead, e.g.:

import win32com.client

filename = 'c:/my_file.xlsx'
sheetname = 'Sheet1'
xl = win32com.client.DispatchEx('Excel.Application')
wb = xl.Workbooks.Open(Filename=filename) 
ws = wb.Sheets(sheetname)

begrow = 1
endrow = ws.UsedRange.Rows.Count
for row in range(begrow,endrow+1): # just an example
  if ws.Range('A{}'.format(row)).Value is None:
    ws.Range('A{}'.format(row)).EntireRow.Delete(Shift=-4162) # shift up

wb.Save()
wb.Close()
xl.Quit()
mechanical_meat
  • 163,903
  • 24
  • 228
  • 223
  • Is there a way to use COM with a variable number of rows? I've never used it (clearly) – DJGrandpaJ Feb 12 '16 at 18:35
  • 1
    Sure thing. Edited answer. – mechanical_meat Feb 12 '16 at 18:36
  • Thanks. I'm not sure yet if I'll be able to use this with the project I'm currently working on, but I'll definitely check into it! – DJGrandpaJ Feb 12 '16 at 18:39
  • You're welcome. It's a Windows-only solution and it should not be used on servers due to memory leaks. – mechanical_meat Feb 12 '16 at 18:39
  • That's what I plan to check with the team about; the script that this one is meant to run in conjunction with is used on Windows and Linux systems, but the team that will be using this particular script is very small so it's a possibility. That being said, do you happen to know of any alternatives, like if openpyxl supports shifting the data? I've read about people copying the relevant rows to a new worksheet and deleting the old one, then renaming the new sheet with the old sheet's name. Seems horrible to have to do if there's a better solution. – DJGrandpaJ Feb 12 '16 at 18:48
  • Well, good luck to you and your team. I really don't use openpyxl enough to know of a better way to do that. I mostly use xlsxwriter, fwiw. – mechanical_meat Feb 12 '16 at 18:53
  • Hey @mechanical_meat Its working without **Shift=-4162** also. Thanks. – Nandini Chaurasiya Sep 24 '22 at 08:37
5

2018 update: I was searching how to delete a row today and found that the functionality is added in openpyxl 2.5.0-b2. Just tried and it worked perfectly. Here's the link where I found the answer: https://bitbucket.org/openpyxl/openpyxl/issues/964/delete_rows-does-not-work-on-deleting

And here's the syntax to delete one row:

ws.delete_rows(index, 1)

where: 'ws' is the worksheet, 'index' is the row number, and '1' is the number of rows to delete.

There's also the ability to delete columns, but I haven't tried that.

jhughs
  • 361
  • 3
  • 5
1

One of the reasons I found is you are initializing the value of i to 1 every time the loop is running make it as follows:

i=1
for row in ws.iter_rows():
    if row[i].internal_value() == None:
        ws.Rows(i).Delete()
    else:
        i + 1

Rest can answer after observing the full code.

1

The same logic can be applied to delete empty columns.

from openpyxl import *
import numpy as np
import os


path = "filepath"

workbooks = os.listdir(path)
workbooks = [_ for _ in workbooks if not _.startswith('~')]



for workbook in workbooks:
    wb2 = load_workbook(os.path.join(path, workbook))
    for sheet in wb2.worksheets:
        max_row_in_sheet = sheet.max_row
        max_col_in_sheet = sheet.max_column
        array_3 = np.array([])
        array_4 = np.array([])
        r = 1
        c = 1
        for r in range(1, max_row_in_sheet+1):
            array_1 = np.array([])
            array_2 = np.array([])
            for c in range (1, max_col_in_sheet+1):
                if sheet.cell(row = r, column = c).value == None:
                    array_1 = np.append(array_2, c)
                    array_2 = array_1
            if len(array_1) == max_col_in_sheet:
                array_3 = np.append(array_4, r)
                array_4 = array_3
                array_3 = array_3.astype(int)
        if len(array_3) != 0:
            index_of_last_array_element = len(array_3) - 1
            while index_of_last_array_element != -1:
                sheet.delete_rows(array_3[index_of_last_array_element], 1)
                index_of_last_array_element = index_of_last_array_element - 1
    wb2.save(workbook)
1

this works for both rows and cols:

import openpyxl
from openpyxl import *
import numpy as np

wb2 = openpyxl.load_workbook('/content/Drafts .xlsx')

for sheet in wb2.worksheets: 
    print ('Your currently in ', sheet)  
    max_row_in_sheet = sheet.max_row  
    max_col_in_sheet = sheet.max_column 
    print (max_row_in_sheet, max_col_in_sheet)

    array_3 = np.array([]) 
    array_4 = np.array([]) 

    r = 1  # initially declaring row as 1
    c = 1  # initially declaring column as 1
    for r in range(1, max_row_in_sheet + 1):  # 31 row
        array_1 = np.array([])
        array_2 = np.array([])
        for c in range(1, max_col_in_sheet + 1):  # 9 cols
            if sheet.cell(row=r, column=c).value == None:  # (9,1)
                array_1 = np.append(array_2, c)
                array_2 = array_1  # 1,2,3,4,5,6,7,8,9
        if len(array_1) == max_col_in_sheet:  # ( 9 == 9 )
            array_3 = np.append(array_4, r)  # 9
            array_4 = array_3
            array_3 = array_3.astype(int)
    if len(array_3) != 0:  # 11len
        index_of_last_array_element = len(array_3) - 1
        while index_of_last_array_element != -1:
            sheet.delete_rows(array_3[index_of_last_array_element], 1)
            index_of_last_array_element = index_of_last_array_element \
                - 1

    max_row_in_sheet = sheet.max_row  # maximum enterd row
    max_col_in_sheet = sheet.max_column  # maximum entered column

    print 'Maximum Rows and Cols after Removing'
    print (max_row_in_sheet, max_col_in_sheet)
    print '======================================'
    col_arr = []
    for x in range(1, sheet.max_column + 1):
        col_arr.append(0)

    for r in range(1, max_row_in_sheet + 1):
        array_1 = np.array([])
        array_2 = np.array([])
        for c in range(1, max_col_in_sheet + 1):
            if sheet.cell(row=r, column=c).value == None:
                array_1 = np.append(array_2, c)
                array_2 = array_1
                col_arr[c - 1] += 1
    print col_arr

    array_2 = [int(x) for x in array_2]
    print len(array_2)
    print array_2
    if len(array_2) != 0:
        index = len(array_2) - 1
        print index
        while index != -1:
            temp = array_2[index]

            # print(temp)

            sheet.delete_cols(temp, 1)
            index = index - 1

wb2.save('/content/outputs.xlsx')
joanis
  • 10,635
  • 14
  • 30
  • 40
BOMBRAM
  • 11
  • 1
0

There are lots of reasons why openpyxl doesn't provide this possibility but you might be able to work something out based on this snippet: https://bitbucket.org/snippets/openpyxl/qyzKn

Otherwise have a look at xlwings for remote controlling Excel without having to mess with COM.

Charlie Clark
  • 18,477
  • 4
  • 49
  • 55
0
openpyxl.worksheet.worksheet.Worksheet.insert_rows()
openpyxl.worksheet.worksheet.Worksheet.insert_cols()
openpyxl.worksheet.worksheet.Worksheet.delete_rows()
openpyxl.worksheet.worksheet.Worksheet.delete_cols()

Specific row:

ws.insert_rows(7)

Column range (same for row):

ws.delete_cols(6, 3)

From the OpenPyXL documentation

(This is a 2018 feature, so remember to upgrade: python3 -m pip install openpyxl --upgrade)

Punnerud
  • 7,195
  • 2
  • 54
  • 44
0

This script iterates through all worksheets in the workbook and delete rows in list "rows_to_delete". Be sure to remove any table formatting before running the script. In other words, you want to convert your table to a normal range first.

import openpyxl
rows_to_delete = [None, '', ' ']
for i in wb.sheetnames:
    print(f'Now in sheet: {i}')
    ws = wb[i]
    # loop each row in column B
    column_b = range(1, ws.max_row)
    for i in reversed(column_b):
        if ws.cell(i, 2).value in rows_to_delete:
            print(f'Deleting Row: {ws.cell(i,2).row}')
            ws.delete_rows(ws.cell(i,2).row)
Tony S
  • 604
  • 6
  • 8