I have a folder labeled 'input' with multiple CSV files in it. They all have the same columns names but the data is different in each CSV file.
How can I use Spark and Java to go to the folder labeled 'input', read all the CSV files in that folder, and merge those CSV files into one file.
The files in the folder may change, e.g. might have 4 CSV files and another day have 6 and so on so forth.
Dataset<Row> df = (
spark.read()
.format("com.databricks.spark.csv")
.option("header", "true")
.load("/Users/input/*.csv")
);
However, I don't get an output, Spark just shuts down.
I don't want to list all the CSV files in the folder, I want the code to take any CSV files present in that folder and merge. Is this possible?
From there I can use that one CSV file to convert into a data frame.