2

Suppose we have an array:

val arr = Array[(String, Int)](("A", 3), ("B", 5), ("C", 2), ("D", 7), ("E", 4))

I want to calculate the cumulative sum of the numbers until it exceeds a threshold. Then I want to concatenate a ".p" suffix for each letter which appeared in the tuples before the threshold.

For instance, let the threshold be 14. In this case I would like an output:

(("A.p", 3), ("B.p", 5), ("C.p", 2), ("D", 7), ("E", 4))

Because 3 + 5 + 2 = 10 < 14, but 3 + 5 + 2 + 7 = 17 > 14.

What I tried is the following:

val resIndex = arr.map(_._2).scanLeft(0){_ + _}.tail.takeWhile(_ < 14).length 
val resSplit = arr.toList.splitAt(resIndex)
val res = resSplit._1.map(e => (e._1.concat(".p"), e._2)) :: resSplit._2

But I'm sure there is a more efficient and better way to achieve this.

Update!

Thank you for all of your answers! I made a little benchmark to compare the solutions and the most efficient way of doing this was with Assad Mendelson's improved solution.

enter image description here

For the benchmark I used a randomly generated array with 0.5 million of elements and a threshold = 10000.

For benchmark I used:

    def time[A](f: => A) = {
  val s = System.nanoTime
  val ret = f
  println("time: "+(System.nanoTime-s)/1e6+"ms")
  ret
}

time {
  println("jwvh solution")
  arr.foldLeft((0,Array[(String,Int)]())){ case ((sum,ar),(s,i)) =>
    if (sum + i < threshold) (sum+i, ar:+((s+".p", i)))
    else (sum,ar:+(s,i))
  }._2
}
sanyi14ka
  • 809
  • 9
  • 14
  • What utility did you use to run the benchmarks? – jwvh Jul 13 '17 at 07:47
  • See the updated question. – sanyi14ka Jul 13 '17 at 07:56
  • Worth considering: [here](https://stackoverflow.com/questions/504103/how-do-i-write-a-correct-micro-benchmark-in-java/513259#513259), [here](https://www.ibm.com/developerworks/java/library/j-jtp02225), and [here](https://www.ibm.com/developerworks/java/library/j-benchmark1/index.html). – jwvh Jul 13 '17 at 09:02

4 Answers4

3

You can do it in a single iteration with foldLeft:

val arr =
Array[(String, Int)](("A", 3), ("B", 5), ("C", 2), ("D", 7), ("E", 4))

val threshold = 14
val (values, sum): (List[(String, Int)], Int) =
  arr.foldLeft((List.empty[(String, Int)], 0)) {
    case ((accumulatedTuples, acc), (str, value)) =>
      val sum = acc + value
      sum match {
        case sum if sum < threshold =>
          ((s"$str.p", value) :: accumulatedTuples, sum)
        case _ => ((str, value) :: accumulatedTuples, acc)
      }
  }

values.foreach(println)
println(s"Sum: $sum")

Yields:

(E,4)
(D,7)
(C.p,2)
(B.p,5)
(A.p,3)

Sum: 10 

If the order of the values matters, we'll need to add a .reverse at the end, or alternatively foldRight instead.

Yuval Itzchakov
  • 146,575
  • 32
  • 257
  • 321
2

OK, here's my 2-cents.

arr.foldLeft((0,Array[(String,Int)]())){ case ((sum,ar),(s,i)) =>
  if (sum + i < 14) (sum+i, ar:+((s+".p", i)))
  else (sum,ar:+(s,i))
}._2
// res0: Array[(String, Int)] = Array((A.p,3), (B.p,5), (C.p,2), (D,7), (E,4))

Like the others, a single foldLeft will do the trick. It's just a little tricky keeping track of the individual elements as the result Array is built.

jwvh
  • 50,871
  • 7
  • 38
  • 64
  • Isn't this exactly the same answer as mine? :) – Yuval Itzchakov Jul 12 '17 at 14:36
  • @YuvalItzchakov, very similar. Mine is more concise and the result is the same collection type, `Array`, and in the correct order but, yes, you did beat me to the punch. _*sigh*_ – jwvh Jul 12 '17 at 14:43
1

First you can simplify your code by combining the last two lines to:

val res = arr.indices.map(ind => if (ind < resIndex) (arr(ind)._1 + ".p", arr(ind)._2) else arr(ind))

The functional way is probably to use Yuval Itzchakov's answer.

If you want better performance then you can go less functional.

First notice that you go over the data twice. You can solve this by doing:

var sum = 0
val threshold = 14
for(v <- arr) yield if (sum < threshold) {
       sum += v._2
       (v._1 + ".p", v._2)
     } else {
       v
     }

You can improve the results even more by noticing that in practice you are creating multiple copies of the data. To solve this you can do:

var sum = 0
val threshold = 14
var ind = 0
while (sum + arr(ind)._2 < threshold) {
    sum += arr(ind)._2
    arr(ind) = (arr(ind)._1 + ".p", arr(ind)._2)
    ind += 1
}
val res = arr

That said, for smaller arrays I would go with the first optimization (joining the last two lines) as it is clearer to read.

Assaf Mendelson
  • 12,701
  • 5
  • 47
  • 56
1

Just simplify it using yield:

 var sum=0
 var result=for(var i <- arr) yield if(sum<14){
  sum+=i._2
   (i._1 + "p",i._2)
} else i
Nhon Dinh
  • 221
  • 1
  • 5