24

I am using python xlwings to read a column of data in Excel 2013. Column A is populated with numbers. To import this column into a python list py_list, I have the following code;

import xlwings as xw

wb = xw.Book('BookName.xlsm')
sht = xw.Book('SheetName')
py_list = sht.range('A2:A40').value

The above code works if the column data is populated at A2:A40. However, the column data can keep growing. Data can grow and stretch to A2:A46 or A2:A80. The last row is empty. It is not known at compile time how many rows of data is in this column.

How can I modify the code to detect the empty cell at the last row so that the range of data can be read by py_list?

I am open to using other python libraries to read the Excel data besides xlwings. I am using python v3.6

user3848207
  • 3,737
  • 17
  • 59
  • 104

5 Answers5

43

I say this a lot about reading files in from csv or excel, but I would use pandas.

import pandas as pd

df = pd.read_excel('filename.xlsm', sheetname=0) # can also index sheet by name or fetch all sheets
mylist = df['column name'].tolist()

an alternative would be to use a dynamic formula using soemthing like OFFSET in excel instead of 'A2:A40', or perhaps a named range?

Stael
  • 2,619
  • 15
  • 19
  • If the sheetname is `SheetName`, how can I modify your code? Is this correct? `df = pd.read_excel('filename.xlsm', sheetname='SheetName') ` – user3848207 Aug 16 '17 at 09:08
  • 2
    that looks right to me, details on the function are here: https://pandas.pydata.org/pandas-docs/stable/generated/pandas.read_excel.html – Stael Aug 16 '17 at 09:24
  • 1
    Your answer will work on excel files which are not password-protected. Unfortunately, my excel file is password-protected. Nevertheless, I upvoted your answer. – user3848207 Aug 17 '17 at 08:52
  • @stael If i wanted to read multiple excel files in a loop and put a the same column in each file in a list. How would i append to the list in each pass? – D Chase Aug 04 '22 at 20:30
12

I know this is an old question, but you can also use openpyxl

from openpyxl import load_workbook
wb = load_workbook("BookName.xlsx")  # Work Book
ws = wb.get_sheet_by_name('SheetName')  # Work Sheet
column = ws['A']  # Column
column_list = [column[x].value for x in range(len(column))]

Notes:

  • Pandas is an awesome library, but installing it just to read an excel column into a list is an overkill IMHO.

  • xlrd is not maintained anymore. From the xlrd github page

    PLEASE NOTE: This library currently has no active maintainers. You are advised to use OpenPyXL instead.

Bitto
  • 7,937
  • 1
  • 16
  • 38
  • 4
    get_sheet_by_name has a deprecation warning. Use ws = wb[wb.sheetnames[x]], where x is the element of a list representing the name of the sheet you want. You must use wb[wb.sheetnames[x]] since this returns an object (a Worksheet) that you can then iterate through using column = ws['A'], you cannot use ws = wb.sheetnames[0], which returns a str object – lobi Oct 15 '19 at 18:06
4

I found this as the easiest way to create lists from the entire columns in excel and it only takes the populated excel cells. import pandas as pd import numpy as np

#Insert complete path to the excel file and index of the worksheet
df = pd.read_excel("PATH.xlsx", sheet_name=0)
# insert the name of the column as a string in brackets
list1 = list(df['Column Header 1']) 
list2 = list(df['Column Header 2'])

print(list1)
print(list2)
3

I went through xlwings documentation to look for something, didn't find something like this, but you can always try and go around this:

temp = [x for x in xw.Range('A2:A200').value if x != None] #A200 just put a big number..

or I don't know try this:

from itertools import takewhile
temp =[takewhile(lambda x: x != None, xw.Range('A2:A70').value)]
while True:
    try:
         next(temp)
    except StopIteration:
         break

at line 2, at first I tried doing something like this:

temp =[lambda x: x for x in xw.Range('D:D').values if x != None else exit()] #or to replace this with quit() but there is no option to break lambdas as far as I know

another option:

temp = iter(xw.Range('A:A').value)
list = []
a = next(temp)               #depending your first cell starts at row 1
while a != None:             #might want zeros or '' etc
    list.append(a)
    a = next(temp)
eladgl
  • 69
  • 8
1

The key to this question is finding out the number of rows in column A.

The number of rows can be found with this single line using xlwings below;

rownum = sht.range('A1').end('down').last_cell.row

One needs to read the API documentation carefully to get the answer.

http://docs.xlwings.org/en/stable/api.html#xlwings.Range

Once the number of rows is found, it is easy to figure out the rest.

user3848207
  • 3,737
  • 17
  • 59
  • 104