3

I'm able to save a Great_Expectations suite to the tmp folder on my Databricks Community Edition as follows:

ge_partdf.save_expectation_suite('/tmp/myexpectation_suite.json',discard_failed_expectations=False)

But the problem is, when I restart the cluster the json file in longer in the tmp folder. The reason for this I guess is because files that reside in the tmp folder are temporary. However if I try and save it a folder that I know exists on Databricks e.g /FileStore/tables I get the error message:

FileNotFoundError: [Errno 2] No such file or directory: '/FileStore/tables/myexpectation_suite.json'

Can someone let me know how to save locally on Databricks please.

Alex Ott
  • 80,552
  • 8
  • 87
  • 132
Patterson
  • 1,927
  • 1
  • 19
  • 56

1 Answers1

2

The save_expectation_suite function uses the local Python API and storing the data on the local disk, not on DBFS - that's why file disappeared.

If you use full Databricks (on AWS or Azure), then you just need to prepend /dbfs to your path, and file will be stored on the DBFS via so-called DBFS fuse (see docs).

On Community edition you will need to to continue to use to local disk and then use dbutils.fs.cp to copy file from local disk to DBFS.

Update for visibility, based on comments:

To refer local files you need to append file:// to the path. So we have two cases:

  1. Copy generated suite from local disk to DBFS:
dbutils.fs.cp('file:///tmp/myexpectation_suite.json', "/FileStore/tables")
  1. Copy suite from DBFS to local disk to load it:
dbutils.fs.cp("/FileStore/tables/myexpectation_suite.json", 
  'file:///tmp/myexpectation_suite.json')
Alex Ott
  • 80,552
  • 8
  • 87
  • 132
  • Hey Alex, gonna try your suggestion now with dbutils.fs.cp. Gonna to have remind myself how to use dbutils.fs.cp .. unless you have a quick answer on how to copy with dbutils.fs.cp? :-) – Patterson Dec 17 '21 at 16:54
  • Hi Alex, I'm getting the following error when I try to copy with dbtuils.fs.cp ```java.io.FileNotFoundException: /tmp/myexpectation_suite.json``` My syntax is as follows: ```dbutils.fs.cp('/tmp/myexpectation_suite.json', "/FileStore/tables")``` – Patterson Dec 17 '21 at 17:02
  • you need to use `file:` for local files: `dbutils.fs.cp('file:///tmp/myexpectation_suite.json', "/FileStore/tables")` – Alex Ott Dec 17 '21 at 17:22
  • ok, using 'file' successfully copied over. Thanks. However, when I try to read the copied json using the syntax below I get error: ```validation_results = ge_partdf.validate(expectation_suite='dbfs:/FileStore/tables/myexpectation_suite.json', only_return_failures=False) GreatExpectationsError: Unable to load expectation suite: IO error while reading dbfs:/FileStore/tables/myexpectation_suite.json``` ```validation_results = ge_partdf.validate(expectation_suite='dbfs:/FileStore/tables/myexpectation_suite.json', only_return_failures=False)``` – Patterson Dec 17 '21 at 17:36
  • Hi Alex, I think you know what my objective here .. its to be able to save expectation_suite to dbfs so that I can read it back without having to save it everytime. – Patterson Dec 17 '21 at 17:53
  • you also need to use either `/dbfs/FileStore`, or on community edition - use `dbutils.fs.cp` again, but in opposite direction – Alex Ott Dec 17 '21 at 17:58
  • Going to attempt this on a Azure Databricks, as I'm not entirely sure what you mean by using dbutils.fs.cp in the opposite direction :-) BTW, sorry for poor way I'm writing these descriptions. I'm still not sure how to do carriage returns on SO – Patterson Dec 17 '21 at 18:12
  • On Azure you just need to append `/dbfs` to DBFS path... By opposite way I meant to swap arguments when you need to copy from DBFS to local file system: `dbutils.fs.cp("/FileStore/tables/myexpectation_suite.json", 'file:///tmp/myexpectation_suite.json')` – Alex Ott Dec 17 '21 at 18:29
  • ok, I followed your instructions on Azure and it was successful. Thanks. On the Community Edition, I successful in copying in the opposite direction as you suggested. However, I thought the objective was the following: ```dbutils.fs.cp('file:///tmp/myexpectation_suite.json', "/FileStore/tables/")``` – Patterson Dec 17 '21 at 18:48
  • Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/240236/discussion-between-patterson-and-alex-ott). – Patterson Dec 17 '21 at 18:49
  • 1
    you are copying `file://` -> `/FileStore` when you first generated that file. And `/FileStore` -> `file://` when you need to use it – Alex Ott Dec 17 '21 at 18:54