2

I have a function which takes an array of values and a sampling rate. The function should remove values randomly by the sampling rate. For instance a sampling rate of 20% should remove 20% of the values. How can I achieve this with a very good performance, because I will iterate over more than 10.000 values?

My idea is something like

for(var i = values.length-1; i >= 0; i--){
    var rnd = Math.floor((Math.random() * 100) + 1);
    if(rnd < samplingRate)
        values.splice(i,1);
}

but I think the Math.random() function is no performant choice.

Robin Wieruch
  • 14,900
  • 10
  • 82
  • 107
  • 2
    First I need a clarification: Do you want to remove 20% of all values as you describe in the text, or remove each value with a chance of 20%, which is what you are doing in the code? They are two very different things. In any case there is no way around `Math.random`, unless you write your own RNG, which hardly will perform better. – RoToRa Aug 30 '14 at 12:31
  • I did an edit for my question. It is like I have written in the text. – Robin Wieruch Aug 30 '14 at 12:34
  • I don't think many `splice` operations after each other have a good performance (the `Math.random` call should be negligible in comparison). You might want to try [`filter`](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Array/filter) instead (although that creates a new array, instead of removing them from `values`) – Bergi Aug 30 '14 at 12:47
  • Have you edited yet? I don't see anything. Are you saying your code has nothing to do with the problem? Can you post your attempt at code that solves the problem, and not something else. – RoToRa Aug 30 '14 at 12:48
  • I think my code should solve the problem of my text. But it doesn't matter, I just wanted to have a better approach for this. – Robin Wieruch Aug 30 '14 at 12:50
  • 2
    No it doesn't. You text says you want to remove 20% of the values. If you have 10.000 values, then that means you'll end up with exactly 8.000 values. With your code you'll have any number of values between 0 and 10.000. – RoToRa Aug 30 '14 at 12:54
  • Now I understand. You are right. I want to remove exactly 20% of the values randomly. – Robin Wieruch Aug 30 '14 at 14:03

2 Answers2

1

There's really no need to iterate over the whole array if you only want to operate on 20% of it.
Loop for n times where n = Math.floor(0.20 * originalArray.length) - 1 and on each iteration get a random element from the array and remove it.

Community
  • 1
  • 1
Etheryte
  • 24,589
  • 11
  • 71
  • 116
  • Nice approach! I will use and mark this as answer when there is no better way to do this. – Robin Wieruch Aug 30 '14 at 12:42
  • That would the first 20% of the values not random 20%. – RoToRa Aug 30 '14 at 12:45
  • 1
    @RoToRa: No, he means "remove a random element (out of the whole array) for n (= 0.2 * values.length) times" – Bergi Aug 30 '14 at 12:49
  • @Bergi Imagine what would happen if the percentage was 100. – Etheryte Aug 30 '14 at 12:58
  • @Nit: So what? Loop `n` times (e.g. from `0` to `n-1` as usual). Where `n` could be anything from `0` to `values.length`. – Bergi Aug 30 '14 at 13:05
  • That's exactly what the `-1` is there for, to loop to `n-1` in case of 100%. – Etheryte Aug 30 '14 at 13:09
  • 1
    If the length of the array is 5, n will equal zero where it should equal one. That's why you don't subtract one. When you finally construct your loop that is where you should use n - 1. – James Aug 30 '14 at 13:28
1

Unless you need to support older browsers use the .filter() method:

var samplingPercentage = samplingRate / 100;

var filtered = values.filter(function() {
  return Math.random() < samplingPercentage;
});
RoToRa
  • 37,635
  • 12
  • 69
  • 105