3

I have an access database name DB_IMPORT_2020.accdb. It contains only one table named DB_IMPORT_2020_PM. I've been struggling a lot trying to import that table to Pandas. What I've been doing so far is:

# define components of our connection string
driver = '{Microsoft Access Driver (*.mdb, *.accdb)}'
filepath = r"C:\Users\corra\Desktop\DB_IMPORT_2020.accdb"

# create a connection to the database
cnxn = pyodbc.connect(driver = driver, dbq = filepath, autocommit = True)

crsr = cnxn.cursor()

# define the components of a query
table_name = 'DB_IMPORT_2020_PM'

# define query

query = "SELECT * FROM {}".format(table_name)

# execute the query

crsr.execute(query)

data = crsr.fetchall()

df = pd.DataFrame(data)

Then I come to the situation where I have a pandas dataframe with a single column and a list in each row.

0
________________________________________________________
0   [86232, 2019-09-12, INTERNET, ... , N ]
1   [86233, 2019-09-12, INTERNET, ... , M ]
2   [86234, 2019-09-12, MEZZO LIBERO, ...  , Q ]
3   ...

I feel like this is not the right way to do it and it is overly complicated. Does anyone know a simpler way to read data in a table of Access with Pandas?

This is the list i get with data = crsr.fetchall()

[(86232, datetime.datetime(2019, 9, 12, 0, 0), 'INTERNET', 'A.M Web', 'Brand_SMX', 0.0, 'gen', '20_FCST', 'OnLine', 'dipendente s', 'Low Rev.', 'STX', 'A.M', 'INTERNET', 'Brand_SMX', 'dipendente s', 'STORICI', 'TIER 1', 1.0, 'TIER 1', 'ALIMENTARI', '04_SRF', 'SMX', 'ALTRI', 'STC', 'Reservation', 'Off + On', 'Online_Res', 'TIER 1', None, None, None, None),
 (86233, datetime.datetime(2019, 9, 12, 0, 0), 'INTERNET', 'A.M Web', 'Brand_SMX', 0.0, 'feb', '20_FCST', 'OnLine', 'dipendente s', 'Low Rev.', 'STC', 'A. M', 'INTERNET', 'Brand_SMX', 'dipendente s', 'STORICI', 'TIER 1', 1.0, 'TIER 1', 'ALIMENTARI', '04_SRF', 'SMX', 'ALTRI', 'STX', 'Reservation', 'Off + On', 'Online_Res', 'TIER 1', None, None, None, None),
 (86234, datetime.datetime(2019, 9, 12, 0, 0), 'MEZZO LIBERO', 'S ITALIA SRL', 'S ELECTRONICS', 0.0, 'gen', '20_FCST', 'OffLine', 'BO / CI', 'Low Rev.', 'STX', 'S Italia Srl', 'MEZZO LIBERO', 'S', 'BEN BOT', 'STORICI', 'INTERCx', 1.0, 'INTERCx', 'INFORMATICA/FOTOGRAFIA', '04_SRF', 'SMX', 'ALTRI', 'STC', 'Reservation', 'Off + On', 'Offline_Res', 'INTX', None, None, None, None),...]
coelidonum
  • 523
  • 5
  • 17

2 Answers2

4

The easiest way to work with an Access database and pandas is to use the sqlalchemy-access dialect (which I maintain).

Does anyone know a simpler way to read data in a table of Access with Pandas?

Just use pandas' read_sql_table method:

import pandas as pd
import sqlalchemy as sa

table_name = 'DB_IMPORT_2020_PM'

engine = sa.create_engine("access+pyodbc://@my_accdb_dsn")
df = pd.read_sql_table(table_name, engine)
Gord Thompson
  • 116,920
  • 32
  • 215
  • 418
  • 3
    You might want to disclose affiliation in this answer and add some more content (a quick example of how to use it), as per [How to not be a spammer](https://stackoverflow.com/help/promotion). Still a good answer though. – Erik A Aug 25 '20 at 11:53
0

Your data is a list of tuples, you need to add the columns when creating the dataframe as described here:

df = pd.DataFrame(data,columns = ["col1","col2",...,"coln"])
Let's try
  • 1,044
  • 9
  • 20
  • I get this error ValueError: Shape of passed values is (1914390, 1), indices imply (1914390, 33) . I have 33 columns and added df = pd.DataFrame(data,columns = ["col1","col2","col3","col4","col5","col6","col7","col8","col9","col10","col11","col12","col13","col14","col15","col16","col17","col18","col19","col20","col21","col22","col23","col24","col25","col26","col27","col28","col29","col30","col31","col32","col33"]) – coelidonum Aug 25 '20 at 10:26
  • If i change to df = pd.DataFrame([data],columns = ["col1","col2","col3","col4","col5","col6","col7","col8","col9","col10","col11","col12","col13","col14","col15","col16","col17","col18","col19","col20","col21","col22","col23","col24","col25","col26","col27","col28","col29","col30","col31","col32","col33"]) i get this error: ValueError: 33 columns passed, passed data had 1914390 columns – coelidonum Aug 25 '20 at 10:29
  • Actually, I have tried to do `pd.DataFrame(data)` and it works with the sample data you provided. What error do you obtain with `df = pd.DataFrame(data[0])`? Do you have any field with `'` or `"` and commas inside or something like that? Otherwise don't really know what's happening. – Let's try Aug 25 '20 at 10:36