DataFrame.to_csv throws error '[Errno 2] No such file or directory'

Question

I am trying to write a DataFrame to a .csv file:

now = datetime.datetime.now()
date = now.strftime("%Y-%m-%d")

enrichedDataDir = "/export/market_data/temp"
enrichedDataFile = enrichedDataDir + "/marketData_optam_" + date + ".csv"

dbutils.fs.ls(enrichedDataDir)
df.to_csv(enrichedDataFile, sep='; ')

This throws me the following error

IOError: [Errno 2] No such file or directory: '/export/market_data/temp/marketData_optam_2018-10-12.csv'

But when i do

dbutils.fs.ls(enrichedDataDir)

Out[72]: []

There is no error! When i go on the directory levels (one level higher):

enrichedDataDir = "/export/market_data"
dbutils.fs.ls(enrichedDataDir)

Out[74]: 
[FileInfo(path=u'dbfs:/export/market_data/temp/', name=u'temp/', size=0L)
 FileInfo(path=u'dbfs:/export/market_data/update/', name=u'update/', size=0L)]

This works, too. This mean for me that i have really all the folders which i want to access. But i dont know thy the .to_csv option throws the error. I also have checked the permissions, which are fine!

STORM · Accepted Answer · 2018-10-14T06:14:52.097

The main problem was, that i am using Micrsoft Azure Datalake Store for storing those .csv files. And for whatever reason, it is not possible through df.to_csv to write to Azure Datalake Store.

Due to the fact that i was trying to use df.to_csv i was using a Pandas DataFrame instead of a Spark DataFrame.

I changed to

from pyspark.sql import *

df = spark.createDataFrame(result,['CustomerId', 'SalesAmount'])

and then write to csv via the following lines

from pyspark.sql import *

df.coalesce(2).write.format("csv").option("header", True).mode("overwrite").save(enrichedDataFile)

And it works.

score 2 · Answer 2 · answered Sep 22 '20 at 10:52

Here is a more general answer.

If you want to load file from DBFS to Pandas dataframe, you can do this trick.

Move the file from dbfs to file

%fs cp dbfs:/FileStore/tables/data.csv file:/FileStore/tables/data.csv
Read data from file dir

data = pd.read_csv('file:/FileStore/tables/data.csv')

Thanks

score 0 · Answer 3 · 2018-10-12T08:15:01.897

0

have you tried opening the file first ? (replace last row of your first example with below code)

from os import makedirs
makedirs(enrichedDataDir)

with open(enrichedDataFile, 'w') as output_file:
    df.to_csv(output_file, sep='; ')

edited Oct 12 '18 at 08:15

answered Oct 12 '18 at 07:48

This time `with open(enrichedDataFile, 'w') as output_file:` throws me the same error as above. – STORM Oct 12 '18 at 07:49
does the output directory exist ? sorry it's not clear since you have not posted verbatim outputs of each of your example – Oct 12 '18 at 07:53
No problem. The output directory exists! But of course the file does not. Should be created by the df.to_csv line. The courious thing is, that i can without problems access the folders and list directories with dbutils.fs.ls. – STORM Oct 12 '18 at 07:58
edited my answer to also add makedirs as all solutions for open and to_csv have this. here https://stackoverflow.com/a/12201952/10417531 and here https://stackoverflow.com/a/18758737/10417531 – Oct 12 '18 at 08:15

score 0 · Answer 4 · answered Aug 17 '21 at 17:45

0

check the permissions on the sas token you used for the container when you mounted this path.. if it starts with "sp=racwdlmeopi" then you have a sas token with immutable storage.. your token should start with "sp=racwdlmeop"

answered Aug 17 '21 at 17:45

brianatsapient

1

DataFrame.to_csv throws error '[Errno 2] No such file or directory'

4 Answers4

Linked

Related