How to save 15k csv files in databricks/ Azure data lake

Question

I've a question how should I download a .csv files from Auzre data lake then make some calculation and save this in .csv again. I know that for downloading .csv I can use: data=pd.read_csv('example.csv') #example

new_data=data//2+data #calculation in databricks notebook and now the question is how to save new_data in .csv format in Azure Data lake with the name: example_calulated.csv

Question language is unclear. Please modify the language and code format of your question so other can better understand what problem you are facing. — Yaakov Bressler, Aug 16 '22 at 16:32

Abhishek K · Answer 1 · 2022-09-02T13:11:10.863

To access files from ADLS you need to Mount an Azure Data Lake Storage Gen2 filesystem to DBFS.

To read files from ADLS use the code below.

df = spark.read.format("csv").option("inferSchema", "true").option("header", "true").option("delimiter",",").load(file_location)

After applying transformations on data, you can write data in CSV file. Follow below code.

target_folder_path = 'path_to_adls_folder '

 
#write as CSV data

df.write.format("CSV").save("example_calulated.csv ")

Then you will have to rename saved csv file using dbutils.fs.mv

Although it rather copies and deletes the old file. There is no real rename function for Databricks

dbutils.fs.mv(old_name, new_name)

To rename 15K files you can refer to this similar issue answered by sri sivani charan

How to save 15k csv files in databricks/ Azure data lake

1 Answers1