2

I sometime extract plenty of data from a very old database (MS Access). The serialized output is stored as YAML files and these files are locally used by other scripts to speed up the process.

Sometimes we to an update of the local files by extracting the (possible) new data from the database. This extraction is quite long and I would like to avoid it if the content of the concerned tables is the same as the last extraction.

Is it possible to get a sort of signature of the state of a table, or part of the table ?

In other words this would help understanding my question:

signature = db.GetSignature('SELECT * FROM foo where bar = 1')
if local_foo.signature != signature:
    local_foo = db.Extract('SELECT * FROM foo where bar = 1')

What solutions could I use?

nowox
  • 25,978
  • 39
  • 143
  • 293

1 Answers1

1

Using triggers

If you have the luxury of controlling the insert/update/delete functionality of the original Access database, the best/safest solution would be to implement database triggers to enable tracking. That way you could easily at least store a "last modified" value or keep a table that is responsible for storing extensive tracking information.

Unfortunately Access doesn't support triggers (unless you're using 2010+, see below), but you could implement triggers using VBA in the database.

Access 2010 introduced data macros, but I don't think that's an option here!

Using the scripting language

If you can't use database triggers, perhaps you could use a workflow like this:

  1. Execute query and get entire result (single collection)
  2. Turn the query result/collection into a JSON string (e.g. json.dumps() in Python)
  3. Get a hash of the JSON string (e.g. hashlib.sha1() & hashObject.hexdigest() in Python)
  4. Compare hash against the last stored hash of the query result.

Using VBA

To keep things database-side (to avoid transferring data), it could be useful to try generating the hash using VBA in the Access database.

You could use the hashing algorithm code mentioned in this SO post: https://stackoverflow.com/a/14749855

Example:

Using this SHA1 code: https://gist.github.com/anonymous/573a875dac68a4af560d

Dim d As DAO.Database
Dim r As DAO.Recordset
Dim s As String
Set d = CurrentDb()
Set r = d.OpenRecordset("SELECT foo, bar, baz FROM foobar")
s = ""
While Not r.EOF
    s = s & r!foo & "," & r!bar & "," & r!baz & ";"
    r.MoveNext
Wend
r.Close
Set r = Nothing
Set d = Nothing
s = SHA1TRUNC(s)
Community
  • 1
  • 1
Nerdwood
  • 3,947
  • 1
  • 21
  • 20
  • Thanks for your updated answer. This is exactly what I currently do apart that I use YAML data to compute my SHA1. The problem is the extraction time. Fetching the data from the database is quite slow so it might be easier to ask the database to get this information for me. – nowox Oct 19 '15 at 11:48
  • Yeah, it's definitely a big workaround to use the scripting language for this task. Much more efficient to have a "last modified" field in the long run. Not sure if you can get VBA to calculate a hash more efficiently. – Nerdwood Oct 19 '15 at 12:01
  • @nowox Added a VBA example that could be helpful for you... if you're wanting to go in that direction. – Nerdwood Oct 19 '15 at 12:29