0

There is an RDD with elements like this:

( (n_1, n_2, r), List( (t1,t2), (t3,t4), ... )

I'm trying to do the following:

def p_rec_l(k: Int, acc: Double, pvals_sum: Double, n_1: Int, n_2: Int, r: Int): Double = {
  if (k==r-1) return 1 - (pvals_sum + acc)
  return p_rec_l(k+1, acc*f(k.toDouble, n_1, n_2), pvals_sum+acc, n_1, n_2, r)
}

def f(k:Double, n_1: Int, n_2:Int): Double = (n_1-k)*(n_2-k)/((k+1)*(N-n_1-n_2+k+1))
N = 2000000
someRDD.map({
    case (key, value) => (key, {
      val n_1 = key._1; val n_2 = key._2; val r = key._3
      val p_k0 = (0 to n_2-1).iterator.map(j => 1- n_1/(N-j.toDouble)).reduceLeft(_*_)
      val pval = p_rec_l(0, p_k0, 0, n_1, n_2, r)
      value.map({
        case (t1, t2) => (t1, t2, n_1, n_2, r, pval)          
      }) 
    })
})

But if r is quite big there is a stack overflow exception and the full process crashes. I edited the code like this:

someRDD.map({
    case (key, value) => (key, {
      val n_1 = key._1; val n_2 = key._2; val r = key._3
      val p_k0 = (0 to n_2-1).iterator.map(j => 1- n_1/(N-j.toDouble)).reduceLeft(_*_)
      var pval = -1.0
      try{
        pval = p_rec_l(0, p_k0, 0, n_1, n_2, r)
      } catch{
        case e: java.lang.StackOverflowError => pval = -1
      }
      value.map({
        case (t1, t2) => (t1, t2, n_1, n_2, r, pval)          
      }) 
    })
})

Before the edition the program finished in about 7 hours, but now it's been working for 36 hours and hasn't finished yet. Is it possible, that this try-catch clause slows the execution so much? If it is, is there any way to impove it?

elfinorr
  • 189
  • 3
  • 12

1 Answers1

0

Probably better solution would be not to catch StackOverflowError, but instead mark your function with @tailrec annotation (as I see it is tail recursive), so you should avoid StackOverflowError at all

@tailrec def p_rec_l(k: Int, acc: Double, pvals_sum: Double, n_1: Int, n_2: Int, r: Int): Double = {
  if (k==r-1) 1 - (pvals_sum + acc)
  else p_rec_l(k+1, acc*f(k.toDouble, n_1, n_2), pvals_sum+acc, n_1, n_2, r)
}

Also, to understand your question a bit better, am I understand right that you compare execution time of successful execution without StackOverflowError and without try-catch, and another execution with try-catch, but with same data which does not cause StackOverflowError, so catch itself is not working while you compare time?

Evgeny
  • 1,760
  • 10
  • 11
  • Oooh, I thought, if I'm writing tail recursive function, scala itself understands that it is tail recursive and optimizes somehow without that mark. Will try, thank you. – elfinorr Mar 28 '18 at 10:34
  • First time I met stackoverflow I limited r like that: val r = math.min(1500, key._3) - because that time it was irrelevant, i was just wondering if spark can handle my data and calculate something. That time where was no try-cause clause at all. Now I'm using the same data and want to know which triplet (n_1, n_2, r) cause stackoverflow (I'm marking them as -1) – elfinorr Mar 28 '18 at 10:37
  • Do not know your data, but would suggest also to change `k==r-1` with `k>=r-1` as soon as k is increased on every iteration, so it is safer in case if start value of `k` is greater than `r-1` – Evgeny Mar 28 '18 at 11:14
  • About annotating: did not find any link to official doc, but this answer could be useful to understand why you need annotation: https://stackoverflow.com/a/35013414/546520 – Evgeny Mar 28 '18 at 11:28
  • Oh, I see.. Well, I hope the execution will be ended in 3-4 hours and I will try to do what you said. Thank you very much! – elfinorr Mar 28 '18 at 11:37
  • unfortunately it didn't work, i'm trying to def function with annotation in spark-shell, but it gives me an error: http://prntscr.com/ixl9ys – elfinorr Mar 28 '18 at 14:58
  • em... if it is class method, it should be also `private` or `final` to not be overridable =) this answer could help: https://stackoverflow.com/a/4785606/546520 – Evgeny Mar 28 '18 at 15:02
  • I'm sorry, I kinda panicked :) Now the function return -infinity instead of error with big arguments, and that is something – elfinorr Mar 28 '18 at 15:06