0

Hello there,

Edited 19.09.2018 14:38 to clarify my problem

first of all, thank you for reading my question! I hope that my problem is even solvable, as I couldn't find any answers in my research on this specific task.

Context:

There is a measurement system that stores measurement data in MS Access .mdb files. The typical size of the files is about 100 to 200 MB and the longest table, which is about 99 % of the space required, has around 400k to 600k datasets. As there are many files, they are packed as .zip archives to reduce the required space (e.g. Compressed: 17131406 bytes, Uncompressed: 245739520 bytes).

Approach:

If I want to evaluate the measurement data, I need to extract the .zip file first:

import zipfile
from zipfile import ZipFile
"""Extract example .zip file into this directory"""

path = "C:/20180730_0931.zip"
archive = zipfile.ZipFile(path, 'r')
archive.extractall(path=None, members=None, pwd=None)

Afterwards I can access the extracted database:

import pyodbc 
"""Important: 32bit MS driver - 32bit python!"""

cnxn = pyodbc.connect('DRIVER={Microsoft Access Driver (*.mdb, *.accdb)};DBQ=C:/20180730_0931.mdb;')
cursor = cnxn.cursor()

cursor.execute("select * from mytable")
for row in cursor.fetchall():
    print(row)

csr = cnxn.cursor()  
csr.close()
del csr
cnxn.close() 

So, whats the problem?

As there are multiple users I think it is a bad solution and can cause problems to just extract the zip file and delete the mdb file afterwards. So my idea was, that the zip file is extracted in a temporary file on the system of the user. As @Goyo pointed out, there are different ways of storing a temporary file (hard drive, a flash drive, an in-memory filesystem, ..). At first I just want to get a working solution, but of course I prefere the fastest possible way.

The workflow of the program could look like this:

  1. Selelect the zip file and extract it into a temporary file on the user system
  2. Connect to the resulting mdb file with pyodbc and fetch all data
  3. Close the connection, delete the temporary file and evaluate the measurement data

I tryed to use different approaches like StringIO, zipfile.read or tempfile, but couldn't get it to work. From the possible duplicate @stovfl linked, I tryed out the combination of zipfile.read and tempfile approach, but the rest didn't seem to fit to my problem:

fenxzip = zipfile.ZipFile("C:/20180730_0931.zip", 'r')

fenfile=tempfile.SpooledTemporaryFile(max_size=10000000000,mode='w+b') 
fenfile.write(fenxzip.read(fenxzip.namelist()[0]))

#Yay a temp file... or is it?

fenfile.close()
fenxzip.close()  

Update 1 - Edited 20.09.2018 10:54

As a SpooledTemporaryFile has no name I did the following:

import pyodbc
import tempfile
from zipfile import ZipFile

File = "C:/20180730_0931.zip"

mdbzip = zipfile.ZipFile(File, 'r')

mdbtemp=tempfile.TemporaryFile(mode='w+b')
mdbtemp.write(mdbzip.read(mdbzip.namelist()[0]))

MDB = mdbtemp.name; DRV = '{Microsoft Access Driver (*.mdb)}'; PWD = ''

# connect to db
con = pyodbc.connect('DRIVER={};DBQ={};PWD={}'.format(DRV,MDB,PWD))
cur = con.cursor()

# run a query and get the results 
SQL = 'SELECT * FROM mytable;'
rows = cur.execute(SQL).fetchall()
for row in rows:
    print(row)
cur.close()
con.close()

mdbtemp.close()
mdbzip.close()    

In this case I have a named temporary file but I get an Error that the file is already used:

pyodbc.Error: ('HY000', "[HY000] [Microsoft][ODBC Microsoft Access Driver] '(unbekannt)' konnte nicht verwendet werden; Datei wird bereits verwendet. (-1024) (SQLDriverConnect); [HY000] [Microsoft][ODBC Microsoft Access Driver]Allgemeine Warnung Registrierungsschlüssel 'Temporary (volatile) Jet DSN for process 0x30dc Thread 0x2f0 DBC 0x5a10064 Jet' kann nicht geöffnet werden. (1); [HY000] [Microsoft][ODBC Microsoft Access Driver]Allgemeine Warnung Registrierungsschlüssel 'Temporary (volatile) Jet DSN for process 0x30dc Thread 0x2f0 DBC 0x5a10064 Jet' kann nicht geöffnet werden. (1); [HY000] [Microsoft][ODBC Microsoft Access Driver] '(unbekannt)' konnte nicht verwendet werden; Datei wird bereits verwendet. (-1024)")

So, could it be, that my approach is not possible at all?

L. Pauls
  • 1
  • 1
  • I don't understand this "Is there a way to load the database from the zip archive into a temporary file or into memory, and then access and process the data in python without extracting it to the harddrive?" Extracting what? The zip file? The database? – Joe Sep 19 '18 at 06:39
  • Looks like the ODBC driver needs a file name. – Stop harming Monica Sep 19 '18 at 06:59
  • Possible duplicate of [Python: unziping special files into memory and getting them into a DataFrame](https://stackoverflow.com/questions/43258102/python-unziping-special-files-into-memory-and-getting-them-into-a-dataframe) – stovfl Sep 19 '18 at 07:19
  • @Joe Sorry if my question is not clear: I want to extract the zip file in memory/temporary file to access the measurements that are stored in the database. I do not want to extract the zip file to my harddrive as I have shown in my code example above. – L. Pauls Sep 19 '18 at 10:04
  • @Goyo Do u suggest that I need the database extracted from the zipfile on my harddrive in order to access the measuremts via pyodbc? – L. Pauls Sep 19 '18 at 10:05
  • @stovfl Thank you for the possible dublicate, I will investigate if it solves my problem – L. Pauls Sep 19 '18 at 10:05
  • But the temporary file may be on your hard drive? – Joe Sep 19 '18 at 10:47
  • @L.Pauls **If** the driver requires you to specify the database as a file name then you need to have it in a file that you can name. It can be in a hard drive, a flash drive, a cloud filesystem, an in-memory filesystem, maybe even a filesystem that wraps a zip file, in case such a thing exists. And of course it can be temporary in any case. But I don't understand why you care, the procedure you outlined is simple and it shoud work, what problem are you trying to solve? – Stop harming Monica Sep 19 '18 at 11:34
  • Updated the question according to your comments. – L. Pauls Sep 19 '18 at 13:52
  • But a `SpooledTemporaryFile` does not have a file name so it won't work unless the driver allows for other way of specifying the database in the connection string. You can do the same with a normal `TemporaryFile` which **will** have a file name. – Stop harming Monica Sep 19 '18 at 14:03
  • @Goyo updated question with the `TemporaryFile` approach – L. Pauls Sep 20 '18 at 08:59
  • Well, if the driver needs exclusive access then you will need to manage the temporariness yourself. – Stop harming Monica Sep 20 '18 at 18:40

0 Answers0