3

Trying to write a dataframe to an excel workbook using Openpyxl on an HDFS using a zeppelin notebook.

My code below -

with pd.ExcelWriter('hdfs:///ip/docs/excel_files/ds.xlsx', engine="openpyxl", mode="w") as writer:
    df.to_excel(writer, sheet_name='Sheet3') 

The Error -

Fail to execute line 2:      df.to_excel(writer, sheet_name='Sheet3')
Traceback (most recent call last):
  File "/tmp/zeppelin_pyspark-8199079202247699001.py", line 380, in <module>
    exec(code, _zcUserQueryNameSpace)
  File "<stdin>", line 2, in <module>
  File "/opt/anaconda3/lib/python3.7/site-packages/pandas/io/excel.py", line 1191, in __exit__
    self.close()
  File "/opt/anaconda3/lib/python3.7/site-packages/pandas/io/excel.py", line 1195, in close
    return self.save()
  File "/opt/anaconda3/lib/python3.7/site-packages/pandas/io/excel.py", line 1228, in save
    return self.book.save(self.path)
  File "/opt/anaconda3/lib/python3.7/site-packages/openpyxl/workbook/workbook.py", line 396, in save
    save_workbook(self, filename)
  File "/opt/anaconda3/lib/python3.7/site-packages/openpyxl/writer/excel.py", line 292, in save_workbook
    archive = ZipFile(filename, 'w', ZIP_DEFLATED, allowZip64=True)
  File "/opt/anaconda3/lib/python3.7/zipfile.py", line 1204, in __init__
    self.fp = io.open(file, filemode)

FileNotFoundError: [Errno 2] No such file or directory: 'hdfs:///ip/docs/excel_files/ds.xlsx'
Jay chuks
  • 389
  • 1
  • 5
  • 18
  • Try if these help.https://stackoverflow.com/questions/37261624/read-write-files-on-hdfs-using-python https://stackoverflow.com/questions/12485718/python-read-file-as-stream-from-hdfs – yammanuruarun Feb 04 '20 at 18:38
  • You can try `Snakebite`, a HDFS client & wrapper by Spotify: https://github.com/spotify/snakebite – APhillips Feb 04 '20 at 19:16

1 Answers1

1

If you are not on Windows, try using pydoop like this:

import pydoop.hdfs as pydoop_hdfs

with pydoop_hdfs.open('hdfs://path_to_file/file.xlsx', mode='wb') as f:
    with pd.ExcelWriter(f, engine='xlsxwriter') as writer:
        df.to_excel(writer, sheet_name='Sheet')
        writer.save()
Juan Kania-Morales
  • 558
  • 1
  • 7
  • 13
  • For anyone wondering in the future, you can read an xlsx file with Pydoop like this: ```with hdfs.open(file_path, "r") as file: df = pd.read_excel(file, engine="openpyxl")``` (line break and indentation after 'as file:') – mh0w Mar 02 '23 at 15:15