1

There exist some similar questions like this and this but they couldn't provide me sufficient help. Following is a piece of my code.

val output = abc.collect()
output.foreach(tup => println(tup._1 + "  " + math.ceil(tup._2 * 1000)/1000))

Following is a piece of the output.

 5         0.835
 1         0.901
 110       0.797
 7         0.821
 11        0.899
 0         0.871
 32        0.313
 78        0.273
 35698     0.333
 119       0.273

I want to have the output in sorted form. I tried takeOrdered(n) but the output it gives is not what I need. It is sorted but perhaps as string, not numbers. It is something like

 0          0.871
 1          0.901
 10         1.072
 11         0.899
 110        0.797
 111        0.288
 12         0.288
 123        0.273
 14         0.554
 153        0.228

Any Help please?

Community
  • 1
  • 1
Asmat Ali
  • 335
  • 1
  • 11
  • maybe try using dataframes, see http://stackoverflow.com/questions/30332619/how-to-sort-by-column-in-descending-order-in-spark-sql – maxymoo Jan 18 '16 at 04:01
  • May be fine but I am not using Spark SQL in my program. – Asmat Ali Jan 18 '16 at 04:06
  • What do you need to do once the RDD is sorted? `sortBy` will sort it, as the questions you link to say. It's not clear what your question actually is. – The Archetypal Paul Jan 18 '16 at 08:05
  • I simply want the RDD to be sorted. – Asmat Ali Jan 18 '16 at 08:07
  • 1
    http://stackoverflow.com/questions/26387753/how-to-reverse-ordering-for-rdd-takeordered You can adjust the ordering function to your needs. In this case, if I understand correctly - convert string to int and order by it. – Tom Ron Jan 18 '16 at 08:56
  • @TomRon, I am getting type mismatch error after adding (Ordering[Int]) – Asmat Ali Jan 18 '16 at 09:34
  • Because you have strings and you need to define another ordering something like - _.toInt < _.toInt . See this - http://www.scala-lang.org/api/2.11.4/index.html#scala.math.Ordering – Tom Ron Jan 18 '16 at 12:56

2 Answers2

1

There is some issue in using takeOrdered(n) with collect(). I tried val output = abc.takeOrdered(10000) and it perfectly worked.

Asmat Ali
  • 335
  • 1
  • 11
0

Instead of first generating the output from the RDD and then applying sort, you could first sort your RDD and then create the output

val abc=sc.parallelize(Array(("5",0.835),("1",0.901),("110",0.797)))
abc.map{case (k,v)=>(k.toInt,v)}.takeOrdered(3).foreach(println(_))
//(1,0.901)
//(5,0.835)
//(110,0.797)
Christian Hirsch
  • 1,996
  • 12
  • 16
  • This could be fine but the output is just a sample. The actual output is a huge data. So instead of Array(("5",0.835),("1",0.901),("110",0.797)), what could be a general statement that could be applied on all the elements. – Asmat Ali Jan 18 '16 at 07:06
  • @Asmat Ali: Could you elaborate your comment a little? What kind of problems would appear if you applied the above code to the data set you have in mind? – Christian Hirsch Jan 18 '16 at 07:12
  • The problem is that I cannot even apply this code to my data set because the data set has millions of nodes and I cannot apply this on each node like you have done to node 5, 1 and 110 as ("5",0.835),("1",0.901),("110",0.797) – Asmat Ali Jan 18 '16 at 07:26