7

There are multiple ways to read excel data into python. Pandas provides aslo an API for writing and reading

import pandas as pd
from pandas import ExcelWriter
from pandas import ExcelFile

df = pd.read_excel('File.xlsx', sheetname='Sheet1')

That works fine.

BUT: What is the way to access the tables of every sheet directly into a pandas dataframe??

enter image description here

The above picture shows a sheet including a table SEPARATED THAN CELL (1,1).

Moreover the sheet might include several tables (listobjects in VBA).

I can not find anywhere the way to read them into pandas.

Note1: It is not possible to modify the workbook to bring all the tables towards cell(1,1). Note2: I would like to use just pandas (if it is possible) and minimize the need to import other libraries. But it there is no other way I am ready to use other lybray. In any case I could not manage with xlwings for instance.

here it looks like its possible to parse the excel file, but no soilution is provided for tables, just for complete sheets.

The documentation of pandas does not seem to offer that possibility.

Thanks.

Pᴇʜ
  • 56,719
  • 10
  • 49
  • 73
JFerro
  • 3,203
  • 7
  • 35
  • 88
  • 1
    If the behavior is the same per every table you can use skip rows in your `pd.read_excel(file, skiprows=n)` now to access all the sheets in a single statement `df = pd.concat(pd.read_excel(file, sheet_name=None, skiprows=n), ignore_index=True)` or use `pd.ExcelWriter(file)` – Umar.H Oct 15 '19 at 16:01
  • As I answered to @Florian the idea is that the excel file is not mine, I can ot open it and change python code for every sheet. there should be a way. A Listobject is a very well and powerfull object in excel. in the same way I access the sheets colection by id the idea is to access the list collection of Tables in every sheet by id. – JFerro Oct 15 '19 at 16:04
  • Also really weird that there is skillrows and not skipcols in read_excel(file, skiprows=n) – JFerro Oct 15 '19 at 16:08
  • 1
    Youll have to write a custom function to iterate each sheet or each file to find where the headers start then pass those params into pandas – Umar.H Oct 15 '19 at 16:17
  • Any workaround to skip columns? – JFerro Oct 15 '19 at 16:36

3 Answers3

6

You can use xlwings, great package for working with excel files in python.

This is for a single table, but it is pretty trivial to use xlwings collections (App>books>sheets>tables) to iterate over all tables. Tables are ofcourse listobjects.

import xlwings
import pandas

with xlwings.App() as App:
    _ = App.books.open('my.xlsx')
    rng = App.books['my.xlsx'].sheets['mysheet'].tables['mytablename'].range
    df: pandas.DataFrame = rng.expand().options(pandas.DataFrame).value
Graham Monkman
  • 404
  • 6
  • 9
3

Here is a way to parse one table, howver it's need you to know some informations on the seet parsed.

df = pd.read_excel("file.xlsx", usecols="B:I", index_col=3)
print(df)

Not elegant and work only if one table is present inside the sheet, but that a first step:

import pandas as pd
import string

letter = list(string.ascii_uppercase)

df1 = pd.read_excel("file.xlsx")

def get_start_column(df):
    for i, column in enumerate(df.columns):
        if df[column].first_valid_index():
            return letter[i]

def get_last_column(df):
    columns = df.columns
    len_column = len(columns)
    for i, column in enumerate(columns):
        if df[column].first_valid_index():
            return letter[len_column - i]

def get_first_row(df):
    for index, row in df.iterrows():
        if not row.isnull().values.all():
            return index + 1

def usecols(df):
    start = get_start_column(df)
    end = get_last_column(df)
    return f"{start}:{end}"

df = pd.read_excel("file.xlsx",  usecols=usecols(df1), header=get_first_row(df1)) 
print(df)
Florian Bernard
  • 2,561
  • 1
  • 9
  • 22
  • Ahhh Florian, so nice, we met in the other question. Thanks. Assume you dont know where is the Listobject possitioned. (Actually I dont want to open the excel file to check it). I would like to look over the tables (listobjects). – JFerro Oct 15 '19 at 16:02
  • 1
    Nice one, if the excel files are small I guess it won't matter but I would only read the first 50 rows into function for performance – Umar.H Oct 16 '19 at 07:02
3

I understand that this question has been marked solved already, but I found an article that provides a much more robust solution: Full Post

I suppose a newer version of this library supports better visibility of the workbook structure. Here is a summary:

  1. Load the workbook using the load_workbook function from openpyxl
  2. Then, you are able to access the sheets within, which contains collection of List-Objects (Tables) in excel.
  3. Once you gain access to the tables, you are able to get to the range addresses of those tables.
  4. Finally they loop through the ranges and create a pandas data-frame from it.

This is a nicer solution as it gives us the ability to loop through all the sheets and tables in a workbook.

Dharman
  • 30,962
  • 25
  • 85
  • 135
Ejaz Ahmed
  • 598
  • 7
  • 13