So, I'm a beginner and learning spark programming (pyspark) on Databricks -
What am I trying to do ?
List all the files in a directory and save it into a dataframe so that I am able to apply filter, sort etc on this list of files. Why ? Because I am trying to find the biggest file in my directory.
Why doesn't below work ? What am I missing ?
from pyspark.sql.types import StringType
sklist = dbutils.fs.ls(sourceFile)
df = spark.createDataFrame(sklist,StringType())