46

I'm using openpyxl to put data validation to all rows that have "Default" in them. But to do that, I need to know how many rows there are.

I know there is a way to do that if I were using Iterable workbook mode, but I also add a new sheet to the workbook and in the iterable mode that is not possible.

pnuts
  • 58,317
  • 11
  • 87
  • 139
human
  • 735
  • 2
  • 8
  • 17
  • Possible duplicate of [Is it possible to get an Excel document's row count without loading the entire document into memory?](http://stackoverflow.com/questions/13377793/is-it-possible-to-get-an-excel-documents-row-count-without-loading-the-entire-d) – Torxed Nov 05 '15 at 10:27

6 Answers6

85

ws.max_row will give you the number of rows in a worksheet.

Since version openpyxl 2.4 you can also access individual rows and columns and use their length to answer the question.

len(ws['A'])

Though it's worth noting that for data validation for a single column Excel uses 1:1048576.

Charlie Clark
  • 18,477
  • 4
  • 49
  • 55
  • 21
    However, `ws.max_row` will not check if last rows are empty or not. If cells _content_ at the end of the worksheet is deleted using _Del_ key or by removing duplicates, remaining empty rows at the end of your data will still count as a used row. If you do not want to keep these empty rows, you will have to delete those **entire** rows by selecting rows number on the left of your spreadsheet and deleting them (right click on selected row number(s) -> Delete) – V. Brunelle Apr 05 '17 at 16:50
  • That isn't the question, though is it? – Charlie Clark Apr 06 '17 at 10:40
  • 2
    `max_row` and `max_column` didn't work for `sheet = wb.active`. I am using `openpyxl==2.4.8` – Hussain Apr 09 '18 at 10:18
  • @Hussain : What didn't work for you, any errors or just an unexpected value? Does it work for you if you explicitly set the sheet by e.g. `wb.worksheets[0]`? Did you end up finding a different solution? – antiplex Aug 29 '18 at 15:13
  • @V.Brunelle Is there any workaround possible without deleting the entire row ,My issue is that I have some data in that row of some other column.If I delete the entire data the data specific to previous column also gets deleted.Kindly suggest – Chandra Shekhar Nov 27 '19 at 08:21
  • @ChandraShekhar If you have data in another column of the same row, then what I explained do not apply to you, since it is not effectively the last row of the sheet. – V. Brunelle Nov 27 '19 at 12:08
  • 4
    These two methods give me the same result. Am I missing something? `ws.max_row` gives me the last filled row in the entire worksheet. And `len(ws['A'])` should give me the length of that specific column. If longest row is 10 but col 'A' has 3 cells filled, the latter method should give me 3, not 10. – Bn.F76 May 06 '20 at 16:47
  • 2
    Why is this the accepted answer ? As other have commented, it is obviously not correct - `len(ws['A'])` returns the same value as `ws.max_row`, the length of the longest column in the entire spreadsheet - and not the number of rows in individual column `A`. – ssc Feb 05 '22 at 08:05
  • This is the correct answer because worksheets should always be considered to be 16384, 1048576 in size .It's much easier to work with an evenly shaped worksheet initially. You cam work down from that, including defining what counts as an empty cell. – Charlie Clark Aug 11 '23 at 17:11
6

This works for me well. It gives number of non empty rows in each column, assuming there are no empty rows in between.

from openpyxl import load_workbook as lw
from openpyxl.utils import get_column_letter

wb = lw(your_xlsx_file)
ws = wb[sheet_name]

for col in range(1, ws.max_column + 1):
    col_letter = get_column_letter(col)
    max_col_row = len([cell for cell in ws[col_letter] if cell.value])
    print("Column: {}, Row numbers: {}".format(col_letter, max_col_row)
DannyG
  • 141
  • 1
  • 5
  • You should probably use an explicit check for `None`. And this will be faster if you work from the end of the worksheet. Also, aliasing `load_workbook? For a single call? Really? – Charlie Clark Aug 11 '23 at 17:13
1

Here is other solution that might be helpful - as openpyxl function max_row and max_column takes into consideration also empty cells with styles applied I think that using pandas is better in that case:

import pandas as pd

def get_max_row_column(df, sheet_name):
    max_row = 1
    max_col = 1
    for sh_name, sh_content in df.items():
        if sh_name == sheet_name:
            max_row = len(sh_content) + 1
            max_col = len(sh_content.columns)
            break
    coordinates = {'max_row': max_row, 'max_col': max_col}
return coordinates

df = pd.read_excel('xls_path', sheet_name=None)
max_row = get_max_row_column(df, 'Test_sheet')['max_row']
max_col = get_max_row_column(df, 'Test_sheet')['max_col']

By providing sheet_name=None I create dictionary of all worksheets where key is sheet name and value sheet content (which is pandas DataFrame de facto).

1

Here's a function that returns the last filled row based on selected column, it's not take in consideration the length of other columns.

Input parameters:

  • ws - worksheet
  • column - selected column
  • !!! The range can be modified based on the needs
def return_lr(ws,column):
    selected_column = ws[column]
    for x in range(1,100): 
        if selected_column[x].value == None:
           lr = x
           return lr
Damy Pui
  • 11
  • 2
0

Find length of row and length of col.

Column:

column=sheet['A']
output tuple-->(A1,A2,A3........An)

len(column)
output length--> 18                    

for row length:

for i in sheet.iter_rows(max_row=0):

    print(len(i))

    break

This will give you length of header row where you put feature name . If you wan to get all rows length add max_row=len(column) and remove break.

marc_s
  • 732,580
  • 175
  • 1,330
  • 1,459
0

NOTE: THIS METHOD ASSUMES THAT THE COLUMN YOU ARE USING DO NOT HAVE BLANK CELLS IN BETWEEN VALUES

| A       | B      | C     |
|:--------|:-------|:------|
| 10R46   | 1005   | 8017  |
| 10R46   | 10335  | 5019  |
| 100R91  | 1005   | 8017  | 
| 10R91   | 243    | 8870  | 
| 10M95   | 4918   | 8305  |
| 10M95   | 9017   | 8305  |
|         | 9470   | 8221  |

Load it into pandas data frame and count the number of non null values.

import pandas as pd

df_split_file = pd.read_excel('testfile.xlsx', sheet_name='sheet1')
last_row = df_split_file['A'].count() + 1


print(len(last_row))

Result of last_row:

6
jun
  • 540
  • 5
  • 17