0

I am working on a use case in Databricks - GCP, where I am trying to find out the delta tables in a schema/database in Databricks which were updated in last 1 day. I used DESCRIBE DETAIL <tableName> and ran this command in loop for all the table names and put a check on the 'lastModified' column, but this takes a lot of time.

Is there any way to get the lastModified time of all tables at once.

Alifiya Ali
  • 77
  • 1
  • 7

1 Answers1

0

I don't think that there is a native way to do it, but if you know the path of all your table you can use a little trick: check the _last_checkpoint file in the _delta_log folder of your tables. If you get the time of this file (for example using the Python function os.path.getmtime, don't forget to add /dbfs at the start of your path to use it) you will basically get the last modified date.

It's probably a lot quicker to do it like that, and you can even parallelize the work if you need, but the downside is that you need to have the path of all your tables (or at least beeing able to find it with a walk on your folders, a folder with a _delta_log is a delta table so you should be able to scan your disks).

EDIT (in case someone later don't look the comment):

As you said the _last_checkpoint file should be modified only every 10 commits, so it's not good. As a workaround, you can look at the date of all the files in _delta_log and takes the last modification (example here), that should work and I suspect it will still be a lot faster than desc detail.

robin loche
  • 276
  • 2
  • 7
  • That's a great work around, Robin. One caveat though, the `_last_checkpoint` is updated after 10 commits. So I would not identify the new data until the 10th commit. Source : https://databricks.com/blog/2019/08/21/diving-into-delta-lake-unpacking-the-transaction-log.html – Alifiya Ali Mar 04 '22 at 22:57
  • Yes you're right, I didn't knew that... As a workaround, you can just look at all the files in the `_delta_log` folder and take the latest one, it will be probably a little slower but still be faster that `desc detail` I think. – robin loche Mar 05 '22 at 12:29