I am new to Python and Spark Programming.
I have data in Below given format-1, which will have data captured for different fields based on timestamp and trigger.
I need to convert this data into format-2, i.e, based on timestamp and Key, need to group all the fields given in format-1 and created records as per Format-2. In Format-1, there are field that does not have any key value (timestamp and Trigger), these fields should be populated for all the records in format-2
Can you please suggest me the best approach to perform this in pyspark.
Format-1:
Event time (key-1) trig (key-2) data field_Name
------------------------------------------------------
2021-05-01T13:57:29Z 30Sec 10 A
2021-05-01T13:57:59Z 30Sec 11 A
2021-05-01T13:58:29Z 30Sec 12 A
2021-05-01T13:58:59Z 30Sec 13 A
2021-05-01T13:59:29Z 30Sec 14 A
2021-05-01T13:59:59Z 30Sec 15 A
2021-05-01T14:00:29Z 30Sec 16 A
2021-05-01T14:00:48Z OFF 17 A
2021-05-01T13:57:29Z 30Sec 110 B
2021-05-01T13:57:59Z 30Sec 111 B
2021-05-01T13:58:29Z 30Sec 112 B
2021-05-01T13:58:59Z 30Sec 113 B
2021-05-01T13:59:29Z 30Sec 114 B
2021-05-01T13:59:59Z 30Sec 115 B
2021-05-01T14:00:29Z 30Sec 116 B
2021-05-01T14:00:48Z OFF 117 B
2021-05-01T14:00:48Z OFF 21 C
2021-05-01T14:00:48Z OFF 31 D
Null Null 41 E
Null Null 51 F
Format-2:
Event Time Trig A B C D E F
--------------------------------------------------------------
2021-05-01T13:57:29Z 30Sec 10 110 Null Null 41 51
2021-05-01T13:57:59Z 30Sec 11 111 Null Null 41 51
2021-05-01T13:58:29Z 30Sec 12 112 Null Null 41 51
2021-05-01T13:58:59Z 30Sec 13 113 Null Null 41 51
2021-05-01T13:59:29Z 30Sec 14 114 Null Null 41 51
2021-05-01T13:59:59Z 30Sec 15 115 Null Null 41 51
2021-05-01T14:00:29Z 30Sec 16 116 Null Null 41 51
2021-05-01T14:00:48Z OFF 17 117 21 31 41 51