Azure output to csv using python

Question

I'm pretty new to Azure and have been having a problem whilst trying to export to a csv. I want to rename the output file from the default part-0000-tid-12345 naming to something more recognisable. My problem is , that when I export the file it creates a Subdirectory with the filename and then within that directory I get the file. Is there a way of getting rid of the directory that's created i.e the path lookslike the write path below, but adds a directory ...outbound/cs_notes_.csv/filenmae.csv

%python
import os, sys, datetime
readPath = "/mnt/publisheddatasmets1mig/metering/smets1mig/cs/system_data_build/notes/rg"
writePath = "/mnt/publisheddatasmets1mig/metering/smets1mig/cs/system_data_build/notes/outbound"
file_list = dbutils.fs.ls(readPath)
for i in file_list:
  file_path = i[0]
  file_name = i[1]
file_name
Current_Date = datetime.datetime.today().strftime ('%Y-%m-%d-%H-%M-%S')
fname = "CS_Notes_" + str(Current_Date) + ".csv"
for i in file_list:
  if i[1].startswith("part-00000"):
    dbutils.fs.cp(readPath+"/"+file_name,writePath+"/"+fname)
    dbutils.fs.rm(readPath+"/"+file_name)

Any help would be appreciated

score 0 · Answer 1 · answered Nov 12 '19 at 08:46

0

It's not possible to do it directly to change the output file name in Apache Spark.

Spark uses Hadoop File Format, which requires data to be partitioned - that's why you have part- files. You can easily change output filename after processing just like in the SO thread.

You may refer similar SO thread, which addressed similar issue.

Hope this helps.

answered Nov 12 '19 at 08:46

CHEEKATLAPRADEEP

12,191
1
19
42

I've renamed the file, but it creates a sub directory in the outbound folder containing my renamed file. I need to move the renamed file up one level so the csv sits in the outbound folder not within the newly created subdirectory – RG0107 Nov 12 '19 at 09:15

Azure output to csv using python

1 Answers1