2

I am new to Python and Spark Programming.

I have data in Below given format-1, which will have data captured for different fields based on timestamp and trigger.

I need to convert this data into format-2, i.e, based on timestamp and Key, need to group all the fields given in format-1 and created records as per Format-2. In Format-1, there are field that does not have any key value (timestamp and Trigger), these fields should be populated for all the records in format-2

Can you please suggest me the best approach to perform this in pyspark.

Format-1:

Event time (key-1)  trig (key-2)    data    field_Name
------------------------------------------------------
2021-05-01T13:57:29Z    30Sec       10          A 
2021-05-01T13:57:59Z    30Sec       11          A 
2021-05-01T13:58:29Z    30Sec       12          A 
2021-05-01T13:58:59Z    30Sec       13          A 
2021-05-01T13:59:29Z    30Sec       14          A 
2021-05-01T13:59:59Z    30Sec       15          A 
2021-05-01T14:00:29Z    30Sec       16          A 
2021-05-01T14:00:48Z    OFF         17          A 
            
2021-05-01T13:57:29Z    30Sec       110         B 
2021-05-01T13:57:59Z    30Sec       111         B 
2021-05-01T13:58:29Z    30Sec       112         B 
2021-05-01T13:58:59Z    30Sec       113         B 
2021-05-01T13:59:29Z    30Sec       114         B 
2021-05-01T13:59:59Z    30Sec       115         B 
2021-05-01T14:00:29Z    30Sec       116         B 
2021-05-01T14:00:48Z    OFF         117         B 
            
2021-05-01T14:00:48Z    OFF         21          C
2021-05-01T14:00:48Z    OFF         31          D
Null                    Null        41          E
Null                    Null        51          F

Format-2:

Event Time              Trig    A   B   C       D       E   F
--------------------------------------------------------------
2021-05-01T13:57:29Z    30Sec   10  110 Null    Null    41  51
2021-05-01T13:57:59Z    30Sec   11  111 Null    Null    41  51
2021-05-01T13:58:29Z    30Sec   12  112 Null    Null    41  51
2021-05-01T13:58:59Z    30Sec   13  113 Null    Null    41  51
2021-05-01T13:59:29Z    30Sec   14  114 Null    Null    41  51
2021-05-01T13:59:59Z    30Sec   15  115 Null    Null    41  51
2021-05-01T14:00:29Z    30Sec   16  116 Null    Null    41  51
2021-05-01T14:00:48Z    OFF     17  117 21      31      41  51
James Z
  • 12,209
  • 10
  • 24
  • 44
  • 2
    This is pivot operation. Plenty of stack overflow answers are there https://stackoverflow.com/questions/37486910/pivot-string-column-on-pyspark-dataframe – Rafa Jul 08 '21 at 11:09

0 Answers0