how to convert dataframe to dict with nestes array in pyspark

Asked Feb 13 '20 at 05:35

Active Feb 13 '20 at 05:54

Viewed 99 times

I have this dataframe sales_df:

id  year    month   total_sales
0   2020    1       200
1   2019    12      866474119
1   2019    10      555
1   2019    11      13073203
1   2020    2       5255259695
1   2020    1       13622027370

From this, I want to make a dictionnary, as follow:

[
  {
    "2020": {
      "1": "200"
    },
    "id": "0"
  },
  {
    "2019": {
      "10": "555",
      "11": "13073203",
      "12": "866474119"
    },
    "2020": {
      "1": "13553473101",
      "2": "6000"
    },
    "id": "1"
  }
]

i convert df to pandas achchive the output i want know without convert how to achive that

edited Feb 13 '20 at 05:54

moys

7,747
2
11
42

asked Feb 13 '20 at 05:35

siva

I think even in pyspark your going to have to use collect() to driver node, and then use asDict() on your list of rows. doing it the pandas way might your best bet. i could be wrong.. – murtihash Feb 13 '20 at 06:24
you can refer : https://stackoverflow.com/questions/19798112/convert-pandas-dataframe-to-a-nested-dict – Prabhanj Feb 13 '20 at 07:08
d = {k: recur_dictify(g.ix[:,1:]) for k,g in grouped} ^ i am geting SyntaxError: invalid syntax – siva Feb 13 '20 at 07:37

how to convert dataframe to dict with nestes array in pyspark

0 Answers0