0

I am running a pyspark script in a notebook in Microsoft Fabric (preview).

The script gets the last modification time of test.csv, which is located in a lakehouse in the same workspace.

The problem is, as soon as you start the session of the notebook, the data of the lakehouse does not refresh for the script. So even if you replace test.csv, or even delete it, the script will only know about the test.csv which existed when the session started.

Is there a way to refresh the data during the session, or get access to the real file?

To get the last modification time i am using the following code:

last_modification_time = os.stat(file_path).st_mtime

i also tried

last_modification_time = os.path.getmtime(file_path)
  • 1
    Try using the MSSparkUtils to get the file mod time https://learn.microsoft.com/en-us/azure/synapse-analytics/spark/microsoft-spark-utilities?pivots=programming-language-python#file-system-utilities Something like files = mssparkutils.fs.ls('Files/TestData/nyctaxi_l1000.csv') for file in files: print(file.name, file.isDir, file.isFile, file.path, file.size, file.modifyTime) – Jon Jul 24 '23 at 07:44
  • After some testing, it doesn't seem to update the date at all. Raise the issue with an ms support ticket – Jon Jul 24 '23 at 11:46
  • Using the mssparkutnils, it also sometimes returns no modify time in the dict it returns., so is a bug – Jon Jul 24 '23 at 11:57

0 Answers0