I have this data as output when i perform timeStamp_df.head()
in pyspark:
Row(timeStamp='ISODate(2020-06-03T11:30:16.900+0000)', timeStamp='ISODate(2020-06-03T11:30:16.900+0000)', timeStamp='ISODate(2020-06-03T11:30:16.900+0000)', timeStamp='ISODate(2020-05-03T11:30:16.900+0000)', timeStamp='ISODate(2020-04-03T11:30:16.900+0000)')
My expected output is:
+-------------------------------+
|timeStamp |
+-------------------------------+
|2020-06-03T11:30:16.900+0000|
|2020-06-03T11:30:16.900+0000|
|2020-06-03T11:30:16.900+0000|
|2020-05-03T11:30:16.900+0000|
|2020-04-03T11:30:16.900+0000|
+-------------------------------+
I tried to first use .collect()
method and want to iterate
rows_list = timeStamp_df.collect()
print(rows_list)
It's output is:
[Row(timeStamp='ISODate(2020-06-03T11:30:16.900+0000)', timeStamp='ISODate(2020-06-03T11:30:16.900+0000)', timeStamp='ISODate(2020-06-03T11:30:16.900+0000)', timeStamp='ISODate(2020-05-03T11:30:16.900+0000)', timeStamp='ISODate(2020-04-03T11:30:16.900+0000)')]
Just to see the values I am using the print statement:
def print_row(row):
print(row.timeStamp)
for row in rows_list:
print_row(row)
But I am getting the single output as it only iterates once in list:
ISODate(2020-06-03T11:30:16.900+0000)
How can I iterate over the data of Row in pyspark?