How to sort a json file with spark in python?

Asked Jun 27 '18 at 23:40

Active Jun 28 '18 at 02:00

Viewed 363 times

I have a json file as following and I would like to sort it using rdd. How would I do it?

I tried the following but it does not sort the words :(

rdd = self.sc.textFile(self.dic_path).sortByKey()


{
  "biennials": 0, 
  "tripolitan": 0, 
  "oblocutor": 0, 
  "leucosyenite": 0, 
  "chilitis": 0, 
  "fabianist": 0, 
  "diazeutic": 0, 
  "alible": 0, 
  "woods": 4601, 
  "preadjournment": 0, 
  "spiders": 0, 
  "fabianism": 0, 
}

edited Jun 28 '18 at 02:00

pault

41,343
15
107
149

asked Jun 27 '18 at 23:40

Omar Hashmi

Possible duplicate of [Spark dataframe is not ordered after sort](https://stackoverflow.com/questions/37872461/spark-dataframe-is-not-ordered-after-sort) – Hyrein Jun 28 '18 at 02:02
1

remember that sorting is useless in distributed system and rdd are distributed datasets. sorting is fruitful if you accumulate data to one partition and one node which is not an efficient use of distributed system. so my suggestion is don't bother with sorting in whole datasets when using distributed system. you can sort and process in each partition or group though. so change your architecture – Ramesh Maharjan Jun 28 '18 at 02:15

How to sort a json file with spark in python?

0 Answers0