I want to create 1 dataframe per file found in the directory
Json in each file looks like:
[{
"a": "Need Help",
"b": 6377,
"c": "Member",
"d": 721,
"timestamp": 1590990807.475662
},
{
"a": "Need Help",
"b": 6377,
"c": "Member",
"d": 721,
"timestamp": 1590990807.475673
},
{
"a": "Need Help",
"b": 6377,
"c": "Member",
"d": 721,
"timestamp": 1590990807.475678
}]
I could do that with below code:
rdd = sparkSession.sparkContext.wholeTextFiles("/content/sample_data/test_data")
dict = rdd.collectAsMap()
for row,value in dict.items():
df = spark.read.json(row)
df.show()
Is there a better way to achieve the same? Thanks in Advance.