I tried to pass class paramiko.sftp_file.SFTPFile
instead of file URL for pandas.read_parquet
and it worked fine. But when I tried the same with Dask, it threw an error. Below is the code I tried to run and the error I get. How can I make this work?
import dask.dataframe as dd
import parmiko
ssh=paramiko.SSHClient()
sftp_client = ssh.open_sftp()
ssh.set_missing_host_key_policy(paramiko.AutoAddPolicy())
source_file=sftp_client.open(str(parquet_file),'rb')
full_df = dd.read_parquet(source_file,engine='pyarrow')
print(len(full_df))
Traceback (most recent call last):
File "C:\Users\rrrrr\Documents\jackets_dask.py", line 22, in <module>
full_df = dd.read_parquet(source_file,engine='pyarrow')
File "C:\Users\rrrrr\AppData\Local\Programs\Python\Python37\lib\site-packages\dask\dataframe\io\parquet.py", line 1173, in read_parquet
storage_options=storage_options
File "C:\Users\rrrrr\AppData\Local\Programs\Python\Python37\lib\site-packages\dask\bytes\core.py", line 368, in get_fs_token_paths
raise TypeError('url type not understood: %s' % urlpath)
TypeError: url type not understood: <paramiko.sftp_file.SFTPFile object at 0x0000007712D9A208>