0

I have a json file as following and I would like to sort it using rdd. How would I do it?

I tried the following but it does not sort the words :(

rdd = self.sc.textFile(self.dic_path).sortByKey()


{
  "biennials": 0, 
  "tripolitan": 0, 
  "oblocutor": 0, 
  "leucosyenite": 0, 
  "chilitis": 0, 
  "fabianist": 0, 
  "diazeutic": 0, 
  "alible": 0, 
  "woods": 4601, 
  "preadjournment": 0, 
  "spiders": 0, 
  "fabianism": 0, 
}
pault
  • 41,343
  • 15
  • 107
  • 149
Omar Hashmi
  • 35
  • 1
  • 1
  • 6
  • Possible duplicate of [Spark dataframe is not ordered after sort](https://stackoverflow.com/questions/37872461/spark-dataframe-is-not-ordered-after-sort) – Hyrein Jun 28 '18 at 02:02
  • 1
    remember that sorting is useless in distributed system and rdd are distributed datasets. sorting is fruitful if you accumulate data to one partition and one node which is not an efficient use of distributed system. so my suggestion is don't bother with sorting in whole datasets when using distributed system. you can sort and process in each partition or group though. so change your architecture – Ramesh Maharjan Jun 28 '18 at 02:15

0 Answers0