Too long to read? Scroll below
This was an interesting study for me personally. According to documentation:
Note that for rounding off a 5, the IEC 60559 standard (see also ‘IEEE
754’) is expected to be used, ‘go to the even digit’. Therefore
round(0.5) is 0 and round(-1.5) is -2. However, this is dependent on
OS services and on representation error (since e.g. 0.15 is not
represented exactly, the rounding rule applies to the represented
number and not to the printed number, and so round(0.15, 1) could be
either 0.1 or 0.2).
Rounding to a negative number of digits means rounding to a power of
ten, so for example round(x, digits = -2) rounds to the nearest
hundred.
For signif the recognized values of digits are 1...22, and non-missing
values are rounded to the nearest integer in that range. Complex
numbers are rounded to retain the specified number of digits in the
larger of the components. Each element of the vector is rounded
individually, unlike printing.
Firstly, you asked "If it is "round to even", why is it 3, i.e. odd number." To be clear, the round to even rule applies for rounding off a 5. If you run round(2.5)
or round(3.5)
, then R returns 2 and 4, respectively.
If you go here, https://stat.ethz.ch/pipermail/r-help/2008-June/164927.html, then you see this response:
The logic behind the round to even rule is that we are trying to
represent an underlying continuous value and if x comes from a truly
continuous distribution, then the probability that x==2.5 is 0 and the
2.5 was probably already rounded once from any values between 2.45 and 2.54999999999999..., if we use the round up on 0.5 rule that we learned in grade school, then the double rounding means that values
between 2.45 and 2.50 will all round to 3 (having been rounded first
to 2.5). This will tend to bias estimates upwards. To remove the
bias we need to either go back to before the rounding to 2.5 (which is
often impossible to impractical), or just round up half the time and
round down half the time (or better would be to round proportional to
how likely we are to see values below or above 2.5 rounded to 2.5, but
that will be close to 50/50 for most underlying distributions). The
stochastic approach would be to have the round function randomly
choose which way to round, but deterministic types are not
comforatable with that, so "round to even" was chosen (round to odd
should work about the same) as a consistent rule that rounds up and
down about 50/50.
If you are dealing with data where 2.5 is likely to represent an exact
value (money for example), then you may do better by multiplying all
values by 10 or 100 and working in integers, then converting back only
for the final printing. Note that 2.50000001 rounds to 3, so if you
keep more digits of accuracy until the final printing, then rounding
will go in the expected direction, or you can add 0.000000001 (or
other small number) to your values just before rounding, but that can
bias your estimates upwards.
Short Answer: If you always round 5s upward, then your data biases upward. But if you round by evens, then your rounded-data, at large, becomes balanced.
Let's test this using your data:
round2 = function(x, n) {
posneg = sign(x)
z = abs(x)*10^n
z = z + 0.5
z = trunc(z)
z = z/10^n
z*posneg
}
x <- data.frame(cbind(
Number = seq(1.05, 2.95, by = .1),
Popular.Round = round2(seq(1.05, 2.95, by = .1), 1),
R.Round = round(seq(1.05, 2.95, by = .1), 1)))
> mean(x$Popular.Round)
[1] 2.05
> mean(x$R.Round)
[1] 2.02
Using a bigger sample:
x <- data.frame(cbind(
Number = seq(1.05, 6000, by = .1),
Popular.Round = round2(seq(1.05, 6000, by = .1), 1),
R.Round = round(seq(1.05, 6000, by = .1), 1)))
> mean(x$Popular.Round)
[1] 3000.55
> mean(x$R.Round)
[1] 3000.537