4

How can I use xlwings to read a "table" in excel, into a pandas DataFrame, where the table "headers" become the DataFrame column names?

Every way I have tried to read the table, the header row is always excluded from the read!

Here is what I've tried, where "b" is my xlwings workbook object:

b.sheets['Sheet1'].range('Table1').options(pd.DataFrame)
b.sheets['Sheet1'].range('Table1').options(pd.DataFrame, headers=False)
b.sheets['Sheet1'].range('Table1').options(pd.DataFrame, headers=True)
CrashLandon
  • 161
  • 2
  • 11
  • You can use square bracket access to work with Excel tables, see this answer here and the link there: https://stackoverflow.com/a/36396797/918626 – Felix Zumstein Jun 28 '17 at 06:31
  • why not just use `pandas.read_excel`? – Maarten Fabré Jun 28 '17 at 08:27
  • can you not save the file as a csv and then pandas.read_csv? I prefer read csv then read excel – Tank Jun 28 '17 at 10:23
  • @FelixZumstein looks like that let's me pick specific columns to read, but I want to read all columns, with the headers pulled into the DataFrame – CrashLandon Jun 28 '17 at 12:42
  • @MaartenFabré I would be happy to do that, but my table is not the only data on the sheet, so I want to call it by name. I don't think the pandas function can do that, but correct me if I'm wrong! – CrashLandon Jun 28 '17 at 12:50
  • @Tanc27 for the sake of efficiency, I'd like to avoid an extra export step. Thanks! – CrashLandon Jun 28 '17 at 12:53
  • 3
    wrong, the brackets allow you to access columns and headers/totals rows. Use: Use: `b.sheets['Sheet1'].range('Table1[[#All]]').options(pd.DataFrame)` – Felix Zumstein Jun 28 '17 at 13:20

3 Answers3

2

Hoping this is not the best answer, but I did find I could reference the named range, then .offset(-1).expand('vertical')

nick
  • 1,090
  • 1
  • 11
  • 24
CrashLandon
  • 161
  • 2
  • 11
1

Another option is to use the api and Excel's ListObject

import xlwings as xw

wb = xw.books.active
ws = wb.sheets('MySheet')
tbl = ws.api.ListObjects('MyTable') # or .ListObjects(1)
rng = ws.range(tbl.range.address) # get range from table address

df = rng.options(pd.DataFrame, header=True).value # load range to dataframe
Jayme Gordon
  • 507
  • 1
  • 6
  • 11
1

Let me add the comment by Felix Zumstein as explicit answer, because in my opinion it is the best solution.

b.sheets['Sheet1'].range('Table1[[#All]]').options(pd.DataFrame)

Using the square bracket notation expands the selection from the table body to the header row as well. Subsequently the conversion to a pandas DataFrame, can use it as header.

To read more, check this answer.

Robert
  • 1,357
  • 15
  • 26