I have a pyspark dataframe with origin-dstination, date (year-month) and a list of JSONs for each date-origin-destination combination:
+---------+--------------+----------+--------------------+
|fs_origin|fs_destination|year-month| JSON|
+---------+--------------+----------+--------------------+
| TLV| AUH| 2022-06|[{"fs_date":"2022...|
| TLV| AUH| 2022-07|[{"fs_date":"2022...|
| TLV| AUH| 2022-08|[{"fs_date":"2022...|
| TLV| AUH| 2022-09|[{"fs_date":"2022...|
| TLV| AUH| 2022-10|[{"fs_date":"2022...|
| TLV| AUH| 2022-11|[{"fs_date":"2022...|
| TLV| BAK| 2022-06|[{"fs_date":"2022...|
| TLV| BAK| 2022-07|[{"fs_date":"2022...|
| TLV| BAK| 2022-08|[{"fs_date":"2022...|
| TLV| BAK| 2022-09|[{"fs_date":"2022...|
| TLV| BAK| 2022-10|[{"fs_date":"2022...|
| TLV| BAK| 2022-11|[{"fs_date":"2022...|
| TLV| BER| 2022-06|[{"fs_date":"2022...|
| TLV| BER| 2022-07|[{"fs_date":"2022...|
| TLV| BER| 2022-08|[{"fs_date":"2022...|
| TLV| BER| 2022-09|[{"fs_date":"2022...|
| TLV| BER| 2022-10|[{"fs_date":"2022...|
| TLV| BER| 2022-11|[{"fs_date":"2022...|
+---------+--------------+----------+--------------------+
I want to turn it into a nested python dictionary, that contains the 'JSON' row by tear-month and origin-destination, something like this:
{
"TLV-AUH": {
"2022-06": [{"fs_date":"2022...],
"2022-07": [{"fs_date":"2022...],
"2022-08": [{"fs_date":"2022...],
"2022-09": [{"fs_date":"2022...],
"2022-10": [{"fs_date":"2022...],
"2022-11": [{"fs_date":"2022...]
},
"TLV-BAK": {
"2022-06": [{"fs_date":"2022...],
"2022-07": [{"fs_date":"2022...],
"2022-08": [{"fs_date":"2022...],
"2022-09": [{"fs_date":"2022...],
"2022-10": [{"fs_date":"2022...],
"2022-11": [{"fs_date":"2022...]
}
"TLV-BER": {
"2022-06": [{"fs_date":"2022...],
"2022-07": [{"fs_date":"2022...],
"2022-08": [{"fs_date":"2022...],
"2022-09": [{"fs_date":"2022...],
"2022-10": [{"fs_date":"2022...],
"2022-11": [{"fs_date":"2022...]
}
}
Thanks!