I have two folders A
and B
A
contains file1.csv.gz
and file2.csv.gz
and B
contains file2.csv.gz
and file3.csv.gz
I would like to read those files in a unique dataframe.
This what I am doing:
folders_to_read = ["A/*.csv.gz", "B/*.csv.gz"]
df = spark.read.format('csv').option("header", "true").option("inferSchema", "true").\
option("mode","DROPMALFORMED").load(i for i in folders_to_read)
But I get an error.
Py4JJavaError: An error occurred while calling o200.load.
: java.lang.ClassCastException: java.util.ArrayList cannot be cast to java.lang.String