0

Spark newbie here. I've got a large set of data that is collected and stored in a folder respective to the date it occurred on on ADLS. Each folder is named according to the date (example: <2020-12-04>). I am trying to query the most recent data that occurred within the last 30 days. Currently, I'm trying to read from adls and try to switch out the date until i get a hit but I'm unable to find a way to check if the path provided is valid. resulting in an error. Any pointers would be helpful

while !folderFound
{
  string path = $"adls://<adlsaccount>/{listofdates[i]}/<file>;
  DataFrame df = spark.Read().orc(path); //need to know if the path is valid so it doesn't error
  .
  .
  .
}
  do some work once we get a successful read
Bonaii
  • 75
  • 6
  • are you able to change the names of the folders? https://stackoverflow.com/questions/54930388/efficient-way-of-reading-parquet-files-between-a-date-range-in-azure-databricks – Ed Elliott Dec 11 '20 at 06:18
  • @EdElliott unfortunately i can not. I can only read from them. – Bonaii Dec 11 '20 at 23:20

0 Answers0