How to iterate over 'Row' values in pyspark?

Question

I have this data as output when i perform timeStamp_df.head() in pyspark:

Row(timeStamp='ISODate(2020-06-03T11:30:16.900+0000)', timeStamp='ISODate(2020-06-03T11:30:16.900+0000)', timeStamp='ISODate(2020-06-03T11:30:16.900+0000)', timeStamp='ISODate(2020-05-03T11:30:16.900+0000)', timeStamp='ISODate(2020-04-03T11:30:16.900+0000)')

My expected output is:

+-------------------------------+
|timeStamp                      |
+-------------------------------+
|2020-06-03T11:30:16.900+0000|
|2020-06-03T11:30:16.900+0000|
|2020-06-03T11:30:16.900+0000|
|2020-05-03T11:30:16.900+0000|
|2020-04-03T11:30:16.900+0000|
+-------------------------------+

I tried to first use .collect() method and want to iterate

rows_list = timeStamp_df.collect()
print(rows_list)

It's output is:

[Row(timeStamp='ISODate(2020-06-03T11:30:16.900+0000)', timeStamp='ISODate(2020-06-03T11:30:16.900+0000)', timeStamp='ISODate(2020-06-03T11:30:16.900+0000)', timeStamp='ISODate(2020-05-03T11:30:16.900+0000)', timeStamp='ISODate(2020-04-03T11:30:16.900+0000)')]

Just to see the values I am using the print statement:

def print_row(row):
    print(row.timeStamp)


for row in rows_list:
    print_row(row)

But I am getting the single output as it only iterates once in list:

ISODate(2020-06-03T11:30:16.900+0000)

How can I iterate over the data of Row in pyspark?

score 2 · Accepted Answer · answered Jul 20 '23 at 13:12

2

You cannot repeat keyword arguments when creating a Row.
A valid Row is iterable:

row = Row(a=10, b=20, c=30)
print([column for column in row])

[10, 20, 30]

answered Jul 20 '23 at 13:12

boyangeor

381
3
6

How to iterate over 'Row' values in pyspark?

1 Answers1