We have two different Azure cloud resource groups, RG1 and RG2, where RG1 hosts the ADB_source of the data source, and RG2 hosts the ADB_sink & ADLS_sink(gen2) of the data sink.
Use Case: We have a few delta tables in ADB_source (ACL enabled) where a list of users has Read access. In the ADB_source workspace, we need to read the delta tables and write them into ADLS_sink as parquet for further processing at the sink.
What's Available: We have a high concurrency cluster created in ADB_Source workspace, which -
- Allows only Python & SQL (dbutils.fs also restricted).
- Credential Passthrough is disabled.
- Has ACLs Enabled in spark config.
- Has mount point created to a container in ADLS_sink.
- Has no Admin Access to the cluster.
Errors Observed:
We could read the delta tables as expected and run action commands as long as they are in the ADB_source workspace. However, when we write that data into the ADLS_sink with .save()
, we get the below error.
Py4JJavaError: An error occurred while calling o410.save. : java.lang.SecurityException: User does not have permission SELECT on any file. User does not have permission MODIFY on any file.
I would appreciate it if anyone could explain this and recommend additional security checks/accesses needed to implement the use case successfully.