I'm trying to get the number of values that are over a certain threshold in a column in a data frame with decimal values ranging from 0 to 1. To do so, I use sapply to iterate over a list of thresholds. When I supply a defined vector of thresholds, sapply works fine but when I use seq() to define the thresholds I get weird results(with repetitions) and the results do not match. This only happens with decimals and not with whole numbers.
t <- data.frame(replicate(10,sample((0:10)/10,1000,rep=TRUE)))
l <- c()
l <- sapply(c(0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9), function(x){
nrow(t[t[,"X1"]>=x,]);
});
l2 <- c()
l2 <- sapply(seq(0, 0.9, 0.1), function(x){
nrow(t[t[,"X1"]>=x,]);
});
print(l)
print(l2)
Output:
> print(l)
[1] 1000 909 811 723 626 530 443 365 275 187
> print(l2)
[1] 1000 909 811 626 626 530 365 275 275 187
When the same code is executed with integers and integer thresholds, l and l2 match perfectly.
Code for whole numbers:
t <- data.frame(replicate(10,sample(0:10,1000,rep=TRUE)))
l <- c()
l <- sapply(c(0, 1, 2, 3, 4, 5, 6, 7, 8, 9), function(x){
nrow(t[t[,"X1"]>=x,]);
});
l2 <- c()
l2 <- sapply(seq(0, 9, 1), function(x){
nrow(t[t[,"X1"]>=x,]);
});
print(l)
print(l2)
Output:
> print(l)
[1] 1000 915 816 729 643 555 468 367 270 188
> print(l2)
[1] 1000 915 816 729 643 555 468 367 270 188
I'm not sure if I'm missing something very basic or making a mistake.
Thank you.