I have multiple csv files in a hdfs directory in hdfs:
/project/project_csv/file1.csv
/project/project_csv/file2.csv
/project/project_csv/file3.csv
Now, in my pyspark program I want to iterate over the path based on the number of files and each time want to store the data into a dataframe and load it data to specific table.
Like:
With the first file1.csv read to df and save to table1:
df = spark.read(file1.csv)
df.write.mode('overwrite').format('hive').saveAsTable(data_base.table_name1)
With the second file2.csv read to df and save to table2:
df = spark.read(file2.csv)
df.write.mode('overwrite').format('hive').saveAsTable(data_base.table_name2)
In the same way, want to iterate on multple files and save the data into different tables.