19

I can open a password-protected Excel file with this:

import sys
import win32com.client
xlApp = win32com.client.Dispatch("Excel.Application")
print "Excel library version:", xlApp.Version
filename, password = sys.argv[1:3]
xlwb = xlApp.Workbooks.Open(filename, Password=password)
# xlwb = xlApp.Workbooks.Open(filename)
xlws = xlwb.Sheets(1) # counts from 1, not from 0
print xlws.Name
print xlws.Cells(1, 1) # that's A1

I'm not sure though how to transfer the information to a pandas dataframe. Do I need to read cells one by one and all, or is there a convenient method for this to happen?

martineau
  • 119,623
  • 25
  • 170
  • 301
dmvianna
  • 15,088
  • 18
  • 77
  • 106
  • Does `xlws` have a `Row` and `RowCount` (or whatever they're called) - if so, then loop over the number of rows and build a list of lists... Then use `pandas.DataFrame` on that... (sorry - don't use windows - so can't give this a go myself) – Jon Clements Mar 16 '13 at 13:05

6 Answers6

13

Simple solution

import io
import pandas as pd
import msoffcrypto

passwd = 'xyz'

decrypted_workbook = io.BytesIO()
with open(path_to_your_file, 'rb') as file:
    office_file = msoffcrypto.OfficeFile(file)
    office_file.load_key(password=passwd)
    office_file.decrypt(decrypted_workbook)

df = pd.read_excel(decrypted_workbook, sheet_name='abc')

pip install --user msoffcrypto-tool

Exporting all sheets of each excel from directories and sub-directories to seperate csv files

from glob import glob
PATH = "Active Cons data"

# Scaning all the excel files from directories and sub-directories
excel_files = [y for x in os.walk(PATH) for y in glob(os.path.join(x[0], '*.xlsx'))] 

for i in excel_files:
    print(str(i))
    decrypted_workbook = io.BytesIO()
    with open(i, 'rb') as file:
        office_file = msoffcrypto.OfficeFile(file)
        office_file.load_key(password=passwd)
        office_file.decrypt(decrypted_workbook)

    df = pd.read_excel(decrypted_workbook, sheet_name=None)
    sheets_count = len(df.keys())
    sheet_l = list(df.keys())  # list of sheet names
    print(sheet_l)
    for i in range(sheets_count):
        sheet = sheet_l[i]
        df = pd.read_excel(decrypted_workbook, sheet_name=sheet)
        new_file = f"D:\\all_csv\\{sheet}.csv"
        df.to_csv(new_file, index=False)
tnfru
  • 296
  • 1
  • 10
Suhas_Pote
  • 3,620
  • 1
  • 23
  • 38
  • 1
    This is very helpful, though I'd recommend changing the variable `i` in the simple solution to something that indicates that it's the path to the file, `file_path` for example, as `i` is commonly used as an iterator. Took me a minute to figure out... – John Conor Oct 06 '22 at 12:58
7

from David Hamann's site (all credits go to him) https://davidhamann.de/2018/02/21/read-password-protected-excel-files-into-pandas-dataframe/

Use xlwings, opening the file will first launch the Excel application so you can enter the password.

import pandas as pd
import xlwings as xw

PATH = '/Users/me/Desktop/xlwings_sample.xlsx'
wb = xw.Book(PATH)
sheet = wb.sheets['sample']

df = sheet['A1:C4'].options(pd.DataFrame, index=False, header=True).value
df
Maurice
  • 71
  • 1
  • 1
  • Looks like there's a `password` argument in `xw.Book()` (maybe this was added to the API since this answer). https://docs.xlwings.org/en/stable/api.html?highlight=password#book – Leo Feb 17 '21 at 10:15
6

Assuming the starting cell is given as (StartRow, StartCol) and the ending cell is given as (EndRow, EndCol), I found the following worked for me:

# Get the content in the rectangular selection region
# content is a tuple of tuples
content = xlws.Range(xlws.Cells(StartRow, StartCol), xlws.Cells(EndRow, EndCol)).Value 

# Transfer content to pandas dataframe
dataframe = pandas.DataFrame(list(content))

Note: Excel Cell B5 is given as row 5, col 2 in win32com. Also, we need list(...) to convert from tuple of tuples to list of tuples, since there is no pandas.DataFrame constructor for a tuple of tuples.

ikeoddy
  • 76
  • 1
  • 3
3

Based on the suggestion provided by @ikeoddy, this should put the pieces together:

How to open a password protected excel file using python?

# Import modules
import pandas as pd
import win32com.client
import os
import getpass

# Name file variables
file_path = r'your_file_path'
file_name = r'your_file_name.extension'

full_name = os.path.join(file_path, file_name)
# print(full_name)

Getting command-line password input in Python

# You are prompted to provide the password to open the file
xl_app = win32com.client.Dispatch('Excel.Application')
pwd = getpass.getpass('Enter file password: ')

Workbooks.Open Method (Excel)

xl_wb = xl_app.Workbooks.Open(full_name, False, True, None, pwd)
xl_app.Visible = False
xl_sh = xl_wb.Worksheets('your_sheet_name')

# Get last_row
row_num = 0
cell_val = ''
while cell_val != None:
    row_num += 1
    cell_val = xl_sh.Cells(row_num, 1).Value
    # print(row_num, '|', cell_val, type(cell_val))
last_row = row_num - 1
# print(last_row)

# Get last_column
col_num = 0
cell_val = ''
while cell_val != None:
    col_num += 1
    cell_val = xl_sh.Cells(1, col_num).Value
    # print(col_num, '|', cell_val, type(cell_val))
last_col = col_num - 1
# print(last_col)

ikeoddy's answer:

content = xl_sh.Range(xl_sh.Cells(1, 1), xl_sh.Cells(last_row, last_col)).Value
# list(content)
df = pd.DataFrame(list(content[1:]), columns=content[0])
df.head()

python win32 COM closing excel workbook

xl_wb.Close(False)
datalifenyc
  • 2,028
  • 1
  • 20
  • 18
  • Instead of manually counting the number of rows and columns, simple use `xl_sh..UsedRange.Rows.Count` and `xl_sh.UsedRange.Columns.Count` – mhc Sep 16 '20 at 01:38
2

Assuming that you can save the encrypted file back to disk using the win32com API (which I realize might defeat the purpose) you could then immediately call the top-level pandas function read_excel. You'll need to install some combination of xlrd (for Excel 2003), xlwt (also for 2003), and openpyxl (for Excel 2007) first though. Here is the documentation for reading in Excel files. Currently pandas does not provide support for using the win32com API to read Excel files. You're welcome to open up a GitHub issue if you'd like.

Phillip Cloud
  • 24,919
  • 11
  • 68
  • 88
  • I can't test it anymore, as I currently don't work in an environment that enables me to do so. I will though mark this or any other answer as accepted if you give the example code and assure me that it works. :7) – dmvianna Jun 21 '13 at 13:51
  • I don't know how to use the win32com API so you'd have to figure that out yourself, but if you look at the documentation that I gave a link to, it gives instructions for doing exactly what you want. There's no need to replicate the example code here since you can just read it there. – Phillip Cloud Jun 21 '13 at 18:27
2

Adding to @Maurice answer to get all the cells in the sheet without having to specify the range

wb = xw.Book(PATH, password='somestring')
sheet = wb.sheets[0] #get first sheet

#sheet.used_range.address returns string of used range
df = sheet[sheet.used_range.address].options(pd.DataFrame, index=False, header=True).value
LouF
  • 21
  • 4