0

How do i create and append a csv file from the rdd result using pyspark

This is my code. For each iteration i need to append the result to the csv

for line in tcp.collect():
        #print value in MyCol1 for each row                
        print line
        v3=np.array(data.select(line).collect())
        x = v3[np.logical_not(np.isnan(v3))] 
        notnan_cnt=data.filter((data[line] != "").count
        print(x)
        cnt_null=data.filter((data[line] == "") | data[line].isNull() | isnan(data[line])).count()
        print(cnt_null,notnan_cnt)
        res_df=line,x.min(),np.percentile(x, 25),np.mean(x),np.std(x),np.percentile(x, 75),x.max(),cnt_null
        print(res_df)
    with open(data_output_file) as fp:
        wr = csv.writer(fp, dialect='excel')
        wr.writerow(res_df)

sample result for rdd: res_df

['var_id', 10000001, 14003088.0, 14228946.912793402, 1874168.857698741, 15017976.0, 18000192, 0]

This gives me type error "typeError: coercing to Unicode: need string or buffer, RDD found". Could you please help

Shankar Panda
  • 736
  • 3
  • 11
  • 27
  • Possible duplicate of [how-to-write-the-resulting-rdd-to-a-csv-file-in-spark-python](https://stackoverflow.com/questions/31898964/how-to-write-the-resulting-rdd-to-a-csv-file-in-spark-python). – Mayank Porwal Nov 21 '18 at 06:35
  • Possible duplicate of [How to write the resulting RDD to a csv file in Spark python](https://stackoverflow.com/questions/31898964/how-to-write-the-resulting-rdd-to-a-csv-file-in-spark-python) – 10465355 Nov 21 '18 at 10:55

0 Answers0