Reading a dataframe from an odc file created through excel using pandas

Question

My company has done some work through SQL and allows us to get access to some dataframe through requesting SQL source from Excel. So I have requested the spreadsheet from data>other source> and created this .odc file (a sort of a data connection) which allows me to read to that particular dataframe So I would like to read the dataframe using pandas but not sure how to do it

I thought about using the sqlalchemy import to create engine but the thing is, the IT colleague is the one that has done all the SQL work, I just need to read the file so I'm not sure if this is the solution.

I thought about using the following:

from sqlalchemy import create_engine
engine = create_engine('postgresql://user@localhost:5432/mydb')

Is there a way to read directly to the .odc file and get the dataframe I wanted from the connection file?

any suggestions are welcome

score 1 · Answer 1 · answered Jul 08 '19 at 15:11

Only MS Office applications can directly work with .odc files which are simply meta data info files and not actual connection objects or APIs. However, you can extract the underlying info, specifically connection string, to use outside of MS Office apps.

Because Office Data Connection (.odc) files are HTML/XML markup files, open them in any text editor and find the <odc:ConnectionString> tag. Then, use this value directly in Python script. However, likely, the connection will utilize an OLEDB provider or ODBC driver which is not integrated with PostgreSQL's SQLAlchemy but you can extract needed params: host, user, database, etc. for your needs. Otherwise look into pyodbc (without SQLAlchemy) using Postgres' ODBC driver.

Below shows an example of .odc file to an MS Access database and same connection string used in Python (without SQLAlchemy).

.odc

<html xmlns:o="urn:schemas-microsoft-com:office:office" xmlns="http://www.w3.org/TR/REC-html40">

<head>
    <meta http-equiv=Content-Type content="text/x-ms-odc; charset=utf-8">
    <meta name=ProgId content=ODC.Table>
    <meta name=SourceType content=ODBC>
    <title>My Access DB</title>
    <xml id=docprops>
        <o:DocumentProperties xmlns:o="urn:schemas-microsoft-com:office:office" xmlns="http://www.w3.org/TR/REC-html40">
            <o:Description>My MS Access DB</o:Description>
            <o:Name>My MS Access DB</o:Name>
        </o:DocumentProperties>
    </xml>
    <xml id=msodc>
        <odc:OfficeDataConnection xmlns:odc="urn:schemas-microsoft-com:office:odc" xmlns="http://www.w3.org/TR/REC-html40">
            <odc:Connection odc:Type="ODBC">
                <odc:ConnectionString>DSN=MS Access Database;DBQ=C:\Path\To\MyDatabase\DB.accdb;DefaultDir=C:\Path\To\MyDatabase;DriverId=25;FIL=MS Access;MaxBufferSize=2048;PageTimeout=5;</odc:ConnectionString>
                <odc:CommandText>SELECT * FROM `C:\Path\To\MyDatabase\DB.accdb`.`MyTable`</odc:CommandText>
            </odc:Connection>
        </odc:OfficeDataConnection>
    </xml>
    <style>
        <!-- .ODCDataSource {
            behavior: url(dataconn.htc);
        }

        -->
    </style>

</head>    
...

Python (notice exact connection string above is used)

import pyodbc
import pandas as pd

conn = pyodbc.connect('DSN=MS Access Database;DBQ=C:\Path\To\MyDatabase\DB.accdb;'
                      'DefaultDir=C:\Path\To\MyDatabase;DriverId=25;FIL=MS Access;'
                      'MaxBufferSize=2048;PageTimeout=5;')

df = pd.read_sql("SELECT * FROM myTable", conn)

...
conn.close()

Reading a dataframe from an odc file created through excel using pandas

1 Answers1

Linked