1

I am trying to write a DataFrame to a .csv file:

now = datetime.datetime.now()
date = now.strftime("%Y-%m-%d")

enrichedDataDir = "/export/market_data/temp"
enrichedDataFile = enrichedDataDir + "/marketData_optam_" + date + ".csv"

dbutils.fs.ls(enrichedDataDir)
df.to_csv(enrichedDataFile, sep='; ')

This throws me the following error

IOError: [Errno 2] No such file or directory: '/export/market_data/temp/marketData_optam_2018-10-12.csv'

But when i do

dbutils.fs.ls(enrichedDataDir)

Out[72]: []

There is no error! When i go on the directory levels (one level higher):

enrichedDataDir = "/export/market_data"
dbutils.fs.ls(enrichedDataDir)

Out[74]: 
[FileInfo(path=u'dbfs:/export/market_data/temp/', name=u'temp/', size=0L)
 FileInfo(path=u'dbfs:/export/market_data/update/', name=u'update/', size=0L)]

This works, too. This mean for me that i have really all the folders which i want to access. But i dont know thy the .to_csv option throws the error. I also have checked the permissions, which are fine!

STORM
  • 4,005
  • 11
  • 49
  • 98

4 Answers4

3

The main problem was, that i am using Micrsoft Azure Datalake Store for storing those .csv files. And for whatever reason, it is not possible through df.to_csv to write to Azure Datalake Store.

Due to the fact that i was trying to use df.to_csv i was using a Pandas DataFrame instead of a Spark DataFrame.

I changed to

from pyspark.sql import *

df = spark.createDataFrame(result,['CustomerId', 'SalesAmount'])

and then write to csv via the following lines

from pyspark.sql import *

df.coalesce(2).write.format("csv").option("header", True).mode("overwrite").save(enrichedDataFile) 

And it works.

STORM
  • 4,005
  • 11
  • 49
  • 98
2

Here is a more general answer.

If you want to load file from DBFS to Pandas dataframe, you can do this trick.

  1. Move the file from dbfs to file

    %fs cp dbfs:/FileStore/tables/data.csv file:/FileStore/tables/data.csv

  2. Read data from file dir

    data = pd.read_csv('file:/FileStore/tables/data.csv')

Thanks

maoyang
  • 1,067
  • 1
  • 11
  • 11
0

have you tried opening the file first ? (replace last row of your first example with below code)

from os import makedirs
makedirs(enrichedDataDir)

with open(enrichedDataFile, 'w') as output_file:
    df.to_csv(output_file, sep='; ')
  • This time `with open(enrichedDataFile, 'w') as output_file:` throws me the same error as above. – STORM Oct 12 '18 at 07:49
  • does the output directory exist ? sorry it's not clear since you have not posted verbatim outputs of each of your example –  Oct 12 '18 at 07:53
  • No problem. The output directory exists! But of course the file does not. Should be created by the df.to_csv line. The courious thing is, that i can without problems access the folders and list directories with dbutils.fs.ls. – STORM Oct 12 '18 at 07:58
  • edited my answer to also add makedirs as all solutions for open and to_csv have this. here https://stackoverflow.com/a/12201952/10417531 and here https://stackoverflow.com/a/18758737/10417531 –  Oct 12 '18 at 08:15
0

check the permissions on the sas token you used for the container when you mounted this path.. if it starts with "sp=racwdlmeopi" then you have a sas token with immutable storage.. your token should start with "sp=racwdlmeop"