Wanted to save my pyspark output into .txt file for future reference. I wrote following code to save my file
fileName=names1[i]+".txt" # Generating file name as fieldname.txt
#data1.groupby(names1[i]).agg(F.collect_set("Passenger_Id")).rdd.saveAsTextFile(names1[i]+'.txt')
data.groupby(names1[i]).agg(F.collect_set("Passenger_Id")).rdd.saveAsTextFile(fileName)
But after running the code I'm seeing folders with variable filename. Say if my filename is abc.txt then I'm seeing folder name as abc.txt and under that folder lots of part file without any extension. Here is the sample format of my part file
Row(Airpotr=u'ST', collect_set(Passenger_Id)=[u'30143072', u'36374515', u'45806865', u'37771107', u'18541154', u'91481534', u'30343069', u'41482082'])
How could I retrieve these part files together & create a spark data frame?
I also tried by following the steps mentioned here
import os
home=os.getcwd()
names1="Airpotr.txt"
dirPath = os.path.join(home, names1)
os.mkdir(dirPath)
textFiles = sc.wholeTextFiles(dirPath)
sorted(textFiles.collect())
but got error message as
SError: [Errno 17] File exists: '/user-home/.../Airpotr.txt'