1

Using python I need to be able to do the following operations to a workbook for excel 2007:

  1. delete rows
  2. sorting a worksheet
  3. getting distinct values from a column

I am looking into openpyxl; however, it seems to have limited capabilities.

Can anyone please recommend a library that can do the above tasks?

Alex Gordon
  • 57,446
  • 287
  • 670
  • 1,062

1 Answers1

7

I want to preface this with letting you know this is only a windows based solution. But if you are using Windows I would recommend using Win32Com which can be found here. This module gives Python programmatic access to any Microsoft Office Application (including Excel) and uses many of the same methods used in VBA. Usually what you will do is record a macro (or recall from memory) how to do something in VBA and then use the same functions in Python

To start we want to connect to Excel and get access to the first sheet as an example

#First we need to access the module that lets us connect to Excel
import win32com.client 

# Next we want to create a variable that represents Excel
app = win32com.client.Dispatch("Excel.Application")   

# Lastly we will assume that the workbook is active and get the first sheet
wbk = app.ActiveWorkbook
sheet = wbk.Sheets(1)

At this point we have a variable named sheet that represents the excel work sheet we will be working with. Of course there are multiple ways to access the sheet, this is usually the way I demo how to use win32com with excel because it is very intuitive.

Now assume I have the following values on the first sheet and I will go over one by one how to answer what you were asking:

     A    
1   "d"
2   "c"
3   "b"
4   "a"
5   "c"

Delete Rows: Lets assume that you want to delete the first row in your active sheet.

sheet.Rows(1).Delete()

This creates:

    A
1   "c"
2   "b"
3   "a"
4   "c"

Next Lets sort the cells in ascending order (although I would recommend extracting the values to python and doing the sorting within a list and sending the values back)

rang = sheet.Range("A1","A4")
sheet.Sort.SetRange(rang)
sheet.Sort.Apply()

This creates:

    A
1   "a"
2   "b"
3   "c"
4   "c"

And now we will get distinct values from the column. The main thing to take away here is how to extract values from a cells. You can either select a lot of cells at once and with sheet.Range("A1","A4") or you can access the values by iterating over cell by cell with sheet.Cells(row,col). Range is orders of magnitude faster, but Cells is slightly easier for debugging.

#Get a list of all Values using Range
valLstRange = [val[0] for val in sheet.Range("A1","A4").Value]

#Get a list of all Values using Cells
valLstCells = [sheet.Cells(row,1).Value for row in range(1,4)]

#valLstCells and valLstRange both = ["a","b","c","c"]

Now lastly you wanted to save the workbook and you can do this with the following:

wbk.SaveAs("C:/savedWorkbook.xlsx")

And you are done!

INFO About COM

If you have worked with VBA, .NET, VBscript or any other language to work with Excel many of these Excel methods will look the same. That is because they are all using the same library provided by Microsoft. This library uses COM, which is Microsoft's way of providing API's to programmers that are language agnostic. COM itself is an older technology and can be tricky to debug. If you want more information on Python and COM I highly recommend Python Programming on Win32 by Mark Hammond. He is the guy that gets a shoutout after you install Python on Windows in the official .msi installer.

ALTERNATIVES TO WIN32COM

I also need to point out there are several fantastic open source alternatives that can be faster than COM in most situations and work on any OS (Mac, Linux, Windows, etc.). These tools all parse the zipped files that comprise a .xlsx. If you did not know that a .xlsx file is a .zip, just change the extension to .zip and you can then explore the contents (kind of interesting to do at least once in your career). Of these I recommend Openpyxl which I have used for parsing and creating Excel files on a server where performance was critical. Never use win32com for server activities as it opens an out-of-process instance of excel.exe for each instance that can be leaky

RECOMMENDATION

I would recommend win32com for users who are working intimately with individual data sets (analysts, financial services, researchers, accountants, business operations, etc.) that are performing data discovery activities as it works great with open workbooks. However, developers or users that need to perform very large tasks with a small footprint or extremely large manipulations or processing in parallel must use a package such as openpyxl.

Michael David Watson
  • 3,028
  • 22
  • 36
  • this is wonderful thank you so much. what is the benefit to using this over using a third party library? – Alex Gordon Sep 07 '12 at 20:59
  • 1
    For starters you will have access to every function that is available in VBA. Another advantage that I really like is being able to write a script on an Excel workbook that is open on my screen so I can visually see my code executing. You can also use Win32COM to script any MS Program that exposes itself to COM not just Excel. – Michael David Watson Sep 10 '12 at 00:36