0

I am trying to read file which is on ADLS using python pandas library on databricks. But I am getting the below error.

File "/databricks/python/lib/python3.7/site-packages/pandas/io/parquet.py", line 310, in read_parquet
return impl.read(path, columns=columns, **kwargs)
File "/databricks/python/lib/python3.7/site-packages/pandas/io/parquet.py", line 125, in read
path, columns=columns, **kwargs
File "/databricks/python/lib/python3.7/site-packages/pyarrow/parquet.py", line 1573, in read_table
ignore_prefixes=ignore_prefixes,
File "/databricks/python/lib/python3.7/site-packages/pyarrow/parquet.py", line 1434, in __init__
ignore_prefixes=ignore_prefixes)
File "/databricks/python/lib/python3.7/site-packages/pyarrow/dataset.py", line 667, in dataset
return _filesystem_dataset(source, **kwargs)
File "/databricks/python/lib/python3.7/site-packages/pyarrow/dataset.py", line 424, in _filesystem_dataset
fs, paths_or_selector = _ensure_single_source(source, filesystem)
File "/databricks/python/lib/python3.7/site-packages/pyarrow/dataset.py", line 371, in _ensure_single_source
filesystem, path = FileSystem.from_uri(path)
File "pyarrow/_fs.pyx", line 347, in pyarrow._fs.FileSystem.from_uri
File "pyarrow/error.pxi", line 122, in pyarrow.lib.pyarrow_internal_check_status
File "pyarrow/error.pxi", line 84, in pyarrow.lib.check_status
pyarrow.lib.ArrowInvalid: Unrecognized filesystem type in URI: abfss://data.parquet
user19930511
  • 299
  • 2
  • 15
  • https://stackoverflow.com/questions/67024015/i-am-trying-to-connect-to-abfss-directlywithout-mounting-to-dbfs-and-trying-to/67024693#67024693 ? – Alex Ott Sep 11 '22 at 11:52
  • The link that you shared is facing some other issue, regarding file not found, the issue that I have mentioned is more related to filesystem that pyarrow is not able to recognize. Pyarrow supports the following filesytems. https://arrow.apache.org/docs/python/filesystems.html – user19930511 Sep 11 '22 at 14:54
  • I am currently testing it with Python 3.8 because adlfs requires python>=3.8, https://pypi.org/project/adlfs/ – user19930511 Sep 11 '22 at 15:01
  • Could you please add your source code? – Rakesh Govindula Sep 12 '22 at 09:09

1 Answers1

0

Pandas library is using File API to read ADLS location which is not supported. to access the storage either you need to mount with Service Prinicple or Credential Pass-through

https://learn.microsoft.com/en-us/azure/databricks/data/data-sources/azure/azure-storage

Bobby
  • 13
  • 1
  • 4
  • i am using the following command : data = pd.read_parquet("abfss://data.parquet", storage_options = {tenant_id="", client_id= "", secret_id=""}) – user19930511 Sep 12 '22 at 09:13