0

I am trying to sort the following array in descending order but can't figure out how.

I have tried using .sort and .sortWith but they don't appear to be applicable to arrays?

val result = postIdCount.withFilter(_._2 > 5).map(_._1.toInt)

result.collect

Array[Int] = Array(41, 974, 662, 9554, 116, 4942, 410, 2269, 5443, 5357, 9435, 2293, 266, 711, 441, 61, 3738, 22, 6318, 8390, 497, 19, 9364, 412, 893, 334, 9000, 678, 313, 253, 979, 842, 4914, 2651, 6547, 6576, 1159, 5224, 1107, 52, 810, 361, 694, 739, 904, 5706, 422, 778, 9818, 758, 130, 265, 6107, 155, 2618, 8941, 8963, 834, 326, 731, 2368, 430, 1253)

Would anyone know how I might achieve this?

Thank you for your help.

EDIT: This is what I have so far:

When I try and add:

val result = postIdCount.withFilter(_._2 > 5).map(_._1.toInt).sorted(Ordering[Integer].reverse)

I get an error saying:

error: value sorted is not a member of org.apache.spark.rdd.RDD[Int]
Tzach Zohar
  • 37,442
  • 3
  • 79
  • 85
Archer
  • 245
  • 2
  • 11
  • Possible duplicate of [What's the best way to inverse sort in scala?](http://stackoverflow.com/questions/7802851/whats-the-best-way-to-inverse-sort-in-scala) – Dmitry Ginzburg Oct 09 '16 at 09:12
  • 1
    You need to sort out your code. You ask about sorting an array, but the error message clearly isn't about an array, but about some obscure type which isn't part of Scala. You need to figure out first where this strange type comes from, and why you have this strange type instead of an array. – Jörg W Mittag Oct 09 '16 at 09:15
  • @JörgWMittag When I do `result.collect`, it seems to be an array? – Archer Oct 09 '16 at 09:18
  • Maybe. But whatever object you are calling `sorted` on isn't an array. Again, the error message very clearly says that it isn't. – Jörg W Mittag Oct 09 '16 at 09:41

4 Answers4

1

postIdCount.withFilter(_._2 > 5).map(_._1.toInt) gives you org.apache.spark.rdd.RDD not Array.

Try

postIdCount.withFilter(_._2 > 5).map(_._1.toInt).collect.sorted(Ordering[Int].reverse)` 

collect function returns all the elements of the dataset as an array. But this collects all data to a single machine in spark cluster.

locoyou
  • 1,697
  • 1
  • 15
  • 19
  • I tried this and got the following: `error: type mismatch; found : scala.math.Ordering[Integer] required: scala.math.Ordering[Any] Note: Integer <: Any, but trait Ordering is invariant in type T. You may wish to investigate a wildcard type such as `_ <: Any`.` – Archer Oct 09 '16 at 09:27
  • @Archer Use `Ordering[Int]` – locoyou Oct 09 '16 at 09:38
1
val sorted = postIdCount
   .withFilter(_._2 > 5)
   .map(_._1.toInt)
   .sortBy(identity, ascending = false)

This returns a sorted RDD[Int].

Tzach Zohar
  • 37,442
  • 3
  • 79
  • 85
0
val sortedArray = array.sorted
sjrd
  • 21,805
  • 2
  • 61
  • 91
0
val sorted = array.sorted(Ordering[Int].reverse)
Dmitry Ginzburg
  • 7,391
  • 2
  • 37
  • 48
  • Hi Dimitry, I tried this but got an error stating `error: value sorted is not a member of org.apache.spark.rdd.RDD[Int]`. I have provided more details in my question. – Archer Oct 09 '16 at 09:12
  • Looks like the structure you're using is not an array, rather `org.apache.spark.rdd.RDD`. So this one should be applicable: http://stackoverflow.com/questions/23838614/how-to-sort-an-rdd-in-scala-spark – Dmitry Ginzburg Oct 09 '16 at 09:15