2

I would like to import excel tables (made by using the Excel 2007 and above tabulating feature) in a workbook into separate dataframes. Apologies if this has been asked before but from my searches I couldn't find what I wanted. I know you can easily do this using the read_excel function however this requires the specification of a Sheetname or returns a dict of dataframes for each sheet.

Instead of specifying sheetname, I was wondering whether there was a way of specifying tablename or better yet return a dict of dataframes for each table in the workbook.

I know this can be done by combining xlwings with pandas but was wondering whether this was built-into any of the pandas functions already (maybe ExcelFile).

Something like this:-

import pandas as pd
xls = pd.ExcelFile('excel_file_path.xls')
# to read all tables to a map
tables_to_df_map = {}
for table_name in xls.table_names:
    table_to_df_map[table_name] = xls.parse(table_name)
vcromy
  • 75
  • 1
  • 8
  • Maybe this answers your question ? https://stackoverflow.com/questions/26521266/using-pandas-to-pd-read-excel-for-multiple-worksheets-of-the-same-workbook – Manrique Jan 10 '19 at 17:14
  • @Manrique Thanks for he suggestion. I found that post yesterday but it's relevant only to sheet loading which I'm not interested in. I'm interested in loading Excel tables (created in excel using the Transform to table tool). – vcromy Jan 11 '19 at 07:10

2 Answers2

2

Although not exactly what I was after, I have found a way to get table names with the caveat that it's restricted to sheet name.

Here's an excerpt from the code that I'm currently using:

import pandas as pd
import openpyxl as op
wb=op.load_workbook(file_location) 
# Connecting to the specified worksheet
ws = wb[sheetname]
# Initliasing an empty list where the excel tables will be imported
# into
var_tables = []
# Importing table details from excel: Table_Name and Sheet_Range
for table in ws._tables:
    sht_range = ws[table.ref]
    data_rows = []
    i = 0
    j = 0
    for row in sht_range:
        j += 1
        data_cols = []
        for cell in row:
            i += 1
            data_cols.append(cell.value)
            if (i == len(row)) & (j == 1):
                data_cols.append('Table_Name')
            elif i == len(row):
                data_cols.append(table.name)
        data_rows.append(data_cols)
        i = 0
    var_tables.append(data_rows)

# Creating an empty list where all the ifs will be appended
# into
var_df = []
# Appending each table extracted from excel into the list
for tb in var_tables:
    df = pd.DataFrame(tb[1:], columns=tb[0])
    var_df.append(df)
# Merging all in one big df
df = pd.concat(var_df,axis=1) # This merges on columns
vcromy
  • 75
  • 1
  • 8
  • It's probably not very pythonic and can be optimized but I didn't spend much time refining it. Feel free to make suggestions or improvements :) – vcromy Feb 15 '19 at 13:47
1

The pandas pd.read_excel(filename,sheetname,skiprows,usecols)
reads an Excel range into a dataframe indicating the number of rows to skip and the columns of the range itself.
To find the range of a named table (knowing its name and sheet) import load_workbook from openpyxl:

wb = load_workbook(filename)  
sheet = wb['Sheet1']
tblrange = sheet.tables['Mytable'].ref

tblrange is returned as a string, like 'B2:C6' and needs to be parsed to get the column range for usecols and the first row for skiprows above, so I created a function (import re is needed to use REGEX) that returns a tuple, ex: (C:E, 2):

def findoffset(r):
   cols = re.sub(r'\d','',r)
   rows = [int(x) for x in (re.findall('\d+',r))]
   nskiprows = rows[0]/1
   return cols, nskiprows

finally the table can be read into a dataframe:

c = findoffset(tblrange)[0] #to get the table columns, like 'C:E'
r = findoffset(tblrange)[1] #to get the nr of rows to skip, if any
df = pd.read_excel('myexcel.xlsx',sheet_name = 'Sheet1',skiprows = r,usecols = c)
Stefano Verugi
  • 101
  • 1
  • 5