In a project where we are computing metrics of graph objects across different edge densities, we have been using a subset call to pull out rows at a specific density. At the moment, density is stored as a numeric field in a data.frame, and the subset also specifies the criterion as a number. This leads to a situation where some subset calls work as expected and others do not. I believe this relates to machine precision on floating point values and we have worked around it by encoding density as a factor, but I wondered if there was a more intelligent way to think about the problem or to understand why R behaves in the way it does. More specifically, I would like to avoid such problems in the future and wonder whether using a factor is the best option, or if there is something more R-thonic.
Thanks for your input!
df <- data.frame(degree=rnorm(20), density=seq(0.01, 0.20, .01))
#only .05, .08, and .09 generate output
for (d in c(.05, .06, .07, .08, .09, .10)) { print(subset(df, density==d)) }
#this works as expected
for (d in seq(.01, .20, .01)) { print(subset(df, density==d)) }
#here is some evidence that machine precision may be to blame
diff(diff(seq(0.01, 0.20, .01)))