0

I have a databricks workspace with json files mounted from my gen 2, I am trying to convert the json file as parquet and save in into my storage account, I created a new directory in my gen 2 where I want my parquet file to be loaded. When I do the following code df.write.parquet("/mnt/youtubedataset/youtube/cleansed") it doesn't seem to be saving it into my gen 2 conatiner.

what am I missing?

  • 1
    you haven't shown any troubleshooting steps, nor indicated whether or not there is any error. I would suggest at the very least running `dbutils.fs.ls('/mnt/youtubedataset/youtube/cleansed')` to see if it can list the directory and if it shows the file there first – Chris Aug 25 '23 at 13:44
  • it does show that the file is in dbutils.fs.ls. How would i save it into my gen 2 account? – user22428402 Aug 25 '23 at 14:00
  • Could you please provide the error which you have got? – Bhavani Aug 25 '23 at 14:30
  • I don't particularly have an error, I'M trying to save a parquet file onto my azure storage account, However it's saved onto the file system in databricks but not in my storage account. I've ran df.write.parquet("/mnt/youtubedataset/youtube/cleansed"), but when I go into my storage account it's empty, even when I refresh the page. – user22428402 Aug 25 '23 at 16:32
  • I tried adding this code newdf.write.mode("overwrite").parquet("/mnt/container-name/directory-name/") – user22428402 Aug 25 '23 at 17:46
  • when I go and check the datalake the files are empty. there is no error – user22428402 Aug 25 '23 at 17:47
  • Have you checked if /mnt/youtubedataset/youtube/cleansed is linked to the same ADLS gen 2 which you are referring to? You can do this using the display(dbutils.fs.mounts()) command. If you are seeing the file using the dbutils.fs.ls('/mnt/youtubedataset/youtube/cleansed') command, that means the file has been written but you need to validate which physical location that points to. – Anupam Chand Aug 26 '23 at 03:33
  • @AnupamChand, The file has been written, so how would i validate the physical location it points to? This is my first time using Databricks. – user22428402 Aug 27 '23 at 09:30
  • Use the display(dbutils.fs.mounts()) command. See https://stackoverflow.com/questions/62215897/how-to-list-all-the-mount-points-in-azure-databricks – Anupam Chand Aug 27 '23 at 09:49
  • Was it being written into a different physical location ? – Anupam Chand Aug 28 '23 at 13:44
  • @Anupam Chad, it was. I found out how to resolve it. Thank you. – user22428402 Aug 29 '23 at 14:50

1 Answers1

0

I tried to replicate the issue as follows:

I have mounted my ADLS container using below code:

dbutils.fs.mount(
    source="wasbs://<containerName>@<storageaccountName>.blob.core.windows.net/",
    mount_point="/mnt/<mountName>",
    extra_configs={
f"fs.azure.account.key.<storageaccountName>.blob.core.windows.net":"<Access-Key>"  
  }
)

enter image description here

I tried to write dataframe into parquet file format into ADLS account using below code:

data = [("Alice", 28), ("Bob", 22), ("Charlie", 35)]
columns = ["Name", "Age"]
df = spark.createDataFrame(data, columns)
df.write.mode("overwrite").parquet("/mnt/youtubedataset\youtube\cleansed")

enter image description here

It wrote successfully as mentioned in above image but when I saw in storage account there is no files in it as mentioned below:

enter image description here

As per this the folders in dbfs:/mnt/ that are not actually mounted volumes but just simple folders. That's why I have checked the location of my mount point using display(dbutils.fs.mounts()) it located the mount location as storage account as mentioned below:

enter image description here

I have added forward slash to the storage path instead of back slash as mentioned below:

df.write.mode("overwrite").parquet("/mnt/youtubedataset/youtube/cleansed")

enter image description here

The file wrote successfully into ADLS storage account:

enter image description here

Once check your mount point is located to storage account.

Bhavani
  • 1,725
  • 1
  • 3
  • 6