0

I am reading a folder in adls in azure databricks which has sub folders containing parquet files.

path - base_folder/filename/

filename has subfolders like 2020, 2021 and these folders again have subfolders for month and day.

So path for actual parquet file is like - base_folder/filename/2020/12/01/part11111.parquet.

I am getting below error if I give a base folder path. enter image description here

I have tried commands in below tread as well but it is showing same error. Unable to infer schema for Parquet. It must be specified manually

enter image description here

Please help me to read all parquet files in all sub folders in one dataframe.

Sharyu Aadhatrao
  • 47
  • 1
  • 1
  • 5

1 Answers1

1

Try with:

spark.read.format("parquet").load(landingFolder)

as specified here: Generic Load/Save Functions

vladsiv
  • 2,718
  • 1
  • 11
  • 21
  • Thanks Vlad. It worked in python using *. Is there a way I can achieve this using scala. – Sharyu Aadhatrao Nov 08 '21 at 09:58
  • 1
    @SharyuAadhatrao You're welcome. It should also work in scala, have you tried it? Please find examples here: [Read all files in a nested folder in Spark](https://stackoverflow.com/questions/32233575/read-all-files-in-a-nested-folder-in-spark), [Select files using a pattern match](https://kb.databricks.com/scala/pattern-match-files-in-path.html), also take a look at [Recursive File Lookup](https://spark.apache.org/docs/latest/sql-data-sources-generic-options.html#recursive-file-lookup) – vladsiv Nov 08 '21 at 13:02
  • @SharyuAadhatrao If this answers your question, mark it as answered please. – vladsiv Nov 08 '21 at 13:56