0

I'm pretty new to Azure and have been having a problem whilst trying to export to a csv. I want to rename the output file from the default part-0000-tid-12345 naming to something more recognisable. My problem is , that when I export the file it creates a Subdirectory with the filename and then within that directory I get the file. Is there a way of getting rid of the directory that's created i.e the path lookslike the write path below, but adds a directory ...outbound/cs_notes_.csv/filenmae.csv

%python
import os, sys, datetime
readPath = "/mnt/publisheddatasmets1mig/metering/smets1mig/cs/system_data_build/notes/rg"
writePath = "/mnt/publisheddatasmets1mig/metering/smets1mig/cs/system_data_build/notes/outbound"
file_list = dbutils.fs.ls(readPath)
for i in file_list:
  file_path = i[0]
  file_name = i[1]
file_name
Current_Date = datetime.datetime.today().strftime ('%Y-%m-%d-%H-%M-%S')
fname = "CS_Notes_" + str(Current_Date) + ".csv"
for i in file_list:
  if i[1].startswith("part-00000"):
    dbutils.fs.cp(readPath+"/"+file_name,writePath+"/"+fname)
    dbutils.fs.rm(readPath+"/"+file_name)

Any help would be appreciated

RG0107
  • 111
  • 10

1 Answers1

0

It's not possible to do it directly to change the output file name in Apache Spark.

Spark uses Hadoop File Format, which requires data to be partitioned - that's why you have part- files. You can easily change output filename after processing just like in the SO thread.

You may refer similar SO thread, which addressed similar issue.

Hope this helps.

CHEEKATLAPRADEEP
  • 12,191
  • 1
  • 19
  • 42
  • I've renamed the file, but it creates a sub directory in the outbound folder containing my renamed file. I need to move the renamed file up one level so the csv sits in the outbound folder not within the newly created subdirectory – RG0107 Nov 12 '19 at 09:15