Hello there,
Edited 19.09.2018 14:38 to clarify my problem
first of all, thank you for reading my question! I hope that my problem is even solvable, as I couldn't find any answers in my research on this specific task.
Context:
There is a measurement system that stores measurement data in MS Access .mdb files. The typical size of the files is about 100 to 200 MB and the longest table, which is about 99 % of the space required, has around 400k to 600k datasets. As there are many files, they are packed as .zip archives to reduce the required space (e.g. Compressed: 17131406 bytes, Uncompressed: 245739520 bytes).
Approach:
If I want to evaluate the measurement data, I need to extract the .zip file first:
import zipfile
from zipfile import ZipFile
"""Extract example .zip file into this directory"""
path = "C:/20180730_0931.zip"
archive = zipfile.ZipFile(path, 'r')
archive.extractall(path=None, members=None, pwd=None)
Afterwards I can access the extracted database:
import pyodbc
"""Important: 32bit MS driver - 32bit python!"""
cnxn = pyodbc.connect('DRIVER={Microsoft Access Driver (*.mdb, *.accdb)};DBQ=C:/20180730_0931.mdb;')
cursor = cnxn.cursor()
cursor.execute("select * from mytable")
for row in cursor.fetchall():
print(row)
csr = cnxn.cursor()
csr.close()
del csr
cnxn.close()
So, whats the problem?
As there are multiple users I think it is a bad solution and can cause problems to just extract the zip file and delete the mdb file afterwards. So my idea was, that the zip file is extracted in a temporary file on the system of the user. As @Goyo pointed out, there are different ways of storing a temporary file (hard drive, a flash drive, an in-memory filesystem, ..). At first I just want to get a working solution, but of course I prefere the fastest possible way.
The workflow of the program could look like this:
- Selelect the zip file and extract it into a temporary file on the user system
- Connect to the resulting mdb file with pyodbc and fetch all data
- Close the connection, delete the temporary file and evaluate the measurement data
I tryed to use different approaches like StringIO
, zipfile.read
or tempfile
, but couldn't get it to work. From the possible duplicate @stovfl linked, I tryed out the combination of zipfile.read
and tempfile
approach, but the rest didn't seem to fit to my problem:
fenxzip = zipfile.ZipFile("C:/20180730_0931.zip", 'r')
fenfile=tempfile.SpooledTemporaryFile(max_size=10000000000,mode='w+b')
fenfile.write(fenxzip.read(fenxzip.namelist()[0]))
#Yay a temp file... or is it?
fenfile.close()
fenxzip.close()
Update 1 - Edited 20.09.2018 10:54
As a SpooledTemporaryFile
has no name I did the following:
import pyodbc
import tempfile
from zipfile import ZipFile
File = "C:/20180730_0931.zip"
mdbzip = zipfile.ZipFile(File, 'r')
mdbtemp=tempfile.TemporaryFile(mode='w+b')
mdbtemp.write(mdbzip.read(mdbzip.namelist()[0]))
MDB = mdbtemp.name; DRV = '{Microsoft Access Driver (*.mdb)}'; PWD = ''
# connect to db
con = pyodbc.connect('DRIVER={};DBQ={};PWD={}'.format(DRV,MDB,PWD))
cur = con.cursor()
# run a query and get the results
SQL = 'SELECT * FROM mytable;'
rows = cur.execute(SQL).fetchall()
for row in rows:
print(row)
cur.close()
con.close()
mdbtemp.close()
mdbzip.close()
In this case I have a named temporary file but I get an Error that the file is already used:
pyodbc.Error: ('HY000', "[HY000] [Microsoft][ODBC Microsoft Access Driver] '(unbekannt)' konnte nicht verwendet werden; Datei wird bereits verwendet. (-1024) (SQLDriverConnect); [HY000] [Microsoft][ODBC Microsoft Access Driver]Allgemeine Warnung Registrierungsschlüssel 'Temporary (volatile) Jet DSN for process 0x30dc Thread 0x2f0 DBC 0x5a10064 Jet' kann nicht geöffnet werden. (1); [HY000] [Microsoft][ODBC Microsoft Access Driver]Allgemeine Warnung Registrierungsschlüssel 'Temporary (volatile) Jet DSN for process 0x30dc Thread 0x2f0 DBC 0x5a10064 Jet' kann nicht geöffnet werden. (1); [HY000] [Microsoft][ODBC Microsoft Access Driver] '(unbekannt)' konnte nicht verwendet werden; Datei wird bereits verwendet. (-1024)")
So, could it be, that my approach is not possible at all?