5

I am trying to sort columns from least to greatest using openpyxl. I am open to using other libraries to accomplish this. Here is the code that I have right now, however, nothing is being sorted.

from openpyxl import load_workbook

wb=load_workbook('NotSorted.xlsx')
ws1=wb.get_sheet_by_name('Mean')

ws1.auto_filter.add_sort_condition('J2:J21')

wb.save('Sorted.xlsx')

Any help is greatly appreciated!

Briana Holton
  • 83
  • 1
  • 1
  • 6

5 Answers5

7

You can sort using win32com.client (install it with pip install pypiwin32).

Example workbook named MyWorkbook.xlsx with contents (before and after):

Before sorting After sorting

import win32com.client

excel = win32com.client.Dispatch("Excel.Application")

wb = excel.Workbooks.Open('MyWorkbook.xlsx')
ws = wb.Worksheets('Sheet1')

ws.Range('A2:A9').Sort(Key1=ws.Range('A1'), Order1=1, Orientation=1)

wb.Save()
excel.Application.Quit()

If you don't want to alter the original workbook, use SaveAs() or create another workbook and copy data like so: ws_from.Range("A1:AF100").Copy(ws_to.Range("A1:AF100")) (with appropriate range).

See these documentation links for more information about Sort() and its parameters:

Aralox
  • 1,441
  • 1
  • 24
  • 44
6

The openpyxl documentation clearly states:

This will add the relevant instructions to the file but will neither actually filter nor sort.

So you would need to compute the new sequence of rows and move the data explicitly (i.e. assigning the cells to their new positions).

Scott Hunter
  • 48,888
  • 12
  • 60
  • 101
2

There seems to be no built-in function to sort within openpyxl but the function below will sort rows given some criteria:

def sheet_sort_rows(ws, row_start, row_end=0, cols=None, sorter=None, reverse=False):
""" Sorts given rows of the sheet
    row_start   First row to be sorted
    row_end     Last row to be sorted (default last row)
    cols        Columns to be considered in sort
    sorter      Function that accepts a tuple of values and
                returns a sortable key
    reverse     Reverse the sort order
"""

bottom = ws.max_row
if row_end == 0:
    row_end = ws.max_row
right = get_column_letter(ws.max_column)
if cols is None:
    cols = range(1, ws.max_column+1)

array = {}
for row in range(row_start, row_end+1):
    key = []
    for col in cols:
        key.append(ws.cell(row, col).value)
    array[key] = array.get(key, set()).union({row})

order = sorted(array, key=sorter, reverse=reverse)

ws.move_range(f"A{row_start}:{right}{row_end}", bottom)
dest = row_start
for src_key in order:
    for row in array[src_key]:
        src = row + bottom
        dist = dest - src
        ws.move_range(f"A{src}:{right}{src}", dist)
        dest += 1

Call it with the worksheet and start row to be sorted as a minimum. By default it'll sort on all columns A...max in that order but this can be changed by passing a 'cols' list. E.g. [4, 2] will sort first on D then on B.

Sort order can be reversed using 'reverse' as with 'sorted()'.

If you need more complex sorting, provide a 'sorter' function. This receives a tuple of values (being those from the 'cols' columns) and should return a sortable key.

It works by ascertaining the desired final destination of each row, moving them all down below the current worksheet, then moving them back to the required destination.

I wanted all columns in each row, but modifying to move a smaller area can be accomplished by changing the two calls to ws.move_range().

Examples:

sheet_sort_rows(ws, 5, 10)  # Sort rows 5-10 using key: A, B, C, ...
sheet_sort_rows(ws, 5, 10, [2, 1])  # Sort rows using B, A
sheet_sort_rows(ws, 5, 10, [2, 1], reverse=True)  # As above in reverse


def sorter(t):
    return t[1] + " " + t[0][::-1]

sheet_sort_rows(ws, 5, 10, sorter=sorter)

This last sorts by column B followed by column A reversed.

Thickycat
  • 894
  • 6
  • 12
0

I tried to use the sheet_sort_rows function above but got an exception on array[key] = array.get(key, set()).union({row}) : List is not hashable (using Python 3.10 and Openpyxl 3.0.9)

As a workaround I'm using the key as a string (concat of all column values converted to str). Not as good as the original idea, and can't use a sorter function to sort some columns in default order and others in reverse order.

Would be nice to have the original working but I don't have a clue (new in python...)

Ali
  • 1,357
  • 2
  • 12
  • 18
fredh
  • 1
  • As it’s currently written, your answer is unclear. Please [edit] to add additional details that will help others understand how this addresses the question asked. You can find more information on how to write good answers [in the help center](/help/how-to-answer). – Community Mar 11 '22 at 02:49
  • I got the same error, but I don't know how to implement your workaround. – David Pesetsky Oct 07 '22 at 12:56
  • Hi David. The original function was not working with lists. I modified it to create a key as a string instead of the key.append. Not nice and does not fit all types, etc but worked for me. Here is my modified code : – fredh Oct 08 '22 at 16:46
  • for row in range(row_start, row_end+1): #key = [] key = '' for col in cols: #key.append(ws.cell(row, col).value) cell_obj = ws.cell(row, col) # if no value we say it goes at the end (zzzz...) if cell_obj.value == None: key += 'zzzzz' else: if cell_obj.data_type == 'n': format_val = "{:5.3f}" key += format_val.format(ws.cell(row, col).value) else: key += str(cell_obj.value) – fredh Oct 08 '22 at 16:48
0

Pandas seems like a good middleware for this

# pip install pandas

import pandas as pd

df = pd.read_excel('input_file.xlsx')
df_sorted = df.sort_values('column_name_to_be_sorted')
df_sorted.to_excel('output_file.xlsx')
PMull34
  • 56
  • 5