1

I build a program based on a rather complex mathematical algorithm. In this I want to account for vectors that have missing values, so NaN. Until now I implemented those by having two vectors - both implemented with breeze's DenseVector[Double]: a vector location which contains the actual values and a vector evidence where a 1.0 denotes that the value is there and a 0.0 that a value isn't. With that I can do thing like this:

val ones = DenseVector.ones[Double](one.evidence.length)
val derivedLocation = one.evidence :* one.location :+ ((ones :- one.evidence) :* two.evidence :* two.location)

Another example would be:

val firstnewvector = myothervector(evidence :== 1.0)
val secondnewvector = myothervector(evidence :== 0.0)

but I also have some other example where I do need 0 as a result not NaN:

def gradientAt: DenseVector[Double] =
      (one.location - two.location) :* evidence :* othervalue

For the sake of argument this example has been simplified. I am thinking about dropping evidence and using NaN where there is no concrete value present, but I am not sure whether that is a good idea. I think it might already be more difficult to implement the above lines, wouldn't it? Also, I am not sure about performance. DenseVector is backed by an Array containing Java primitives and preventing slow auto-boxing if I am not mistaken. Using Double.NaN might require classes instead of primitives, and might slow the whole program down a little and cost more memory - is that right? (Speed and memory is a issue in general).

So: Is it a good idea in my case to use Double.NaN or considering 1) nice code and 2) performance (memory and speed)?

Make42
  • 12,236
  • 24
  • 79
  • 155
  • Do I understand correctly that using NaNs would simplify your code to `val derivedLocation = one.location :+ two.location`? Then yes, you should absolutely use that. – Bergi Oct 25 '16 at 15:34
  • Are you actually using the `.evidence` for anything but masking out values from `.location`? And is the actual value in `.location` when the evidence says "that the value isn't there"? – Bergi Oct 25 '16 at 15:35
  • `NaN` is a primitive double value like all others and should not require any boxing either. – Bergi Oct 25 '16 at 15:36
  • @Bergi: Now I posted all my current uses of evidence. Other uses are basically the same as in the question. In my math formulas I **am** using evidence though, because in math we usually don't have "NaN" ;-). So that might actually make it easier to translate the formulas to code, but I am not sure myself. Another thing is, one might consider (for later) to have the algorithm not just evidence with 0 and 1, but some number in between. – Make42 Oct 25 '16 at 16:25
  • @Bergi: For the middle example I am not able to find a way to implement this with NaN. Breeze's DenseVector does not have `isNaN`. – Make42 Oct 25 '16 at 16:39
  • I think `value.map(d => if (isNan(d)) 1 else 0) :* othervalue` should do. Maybe also have a look [here](http://stackoverflow.com/q/16112287/1048572) on how to prevent boxing – Bergi Oct 25 '16 at 16:46
  • @Bergi: `DenseVector` does not support `map`. – Make42 Oct 25 '16 at 17:16
  • https://github.com/scalanlp/breeze/wiki/Linear-Algebra-Cheat-Sheet#map-and-reduce says it does. – Alexey Romanov Oct 25 '16 at 17:47
  • 1
    @Bergi: `val derivedLocation = one.location :+ two.location` is not right. I want derivedLocation to be as long as one.location and two.location (all the same length). For each cell I want the value of one.location unless it's value is NaN, then I want the value of two.location. – Make42 Oct 26 '16 at 13:35
  • @Bergi: If not using evidence I think I need to go some way like http://stackoverflow.com/a/28339208/4533188, which is either slow or ugly. Slow is not acceptable, ugly... well I might as well stick with what I implemented... – Make42 Oct 26 '16 at 13:45
  • @Make42 Ah, now I see. And I suppose you also want to create a new evidence for the values that are in neither location? – Bergi Oct 26 '16 at 16:04
  • @Bergi: Currently I need to create also a `derivedEvidence` that should be necessary if I migrate to NaNs. If I migrate to NaNs then if there is a NaN at one.location **and** two.location**, then this should result in NaN of course. – Make42 Oct 26 '16 at 17:48

0 Answers0