Engineers and scientists frequently write iterative programs in which a floating point value steps through a range of values in small increments.
For example, suppose the "time" variable needs to change from a low of tMin to a high of tMax
in steps of deltaT
, where all these variables are doubles.
The obvious BUT INCORRECT approach is as follows:
`for( time = tMin; time <= tMax; time += deltaT ) {
// Use the time variable in the loop
}
`
So why is this so wrong?
If deltaT is small and/or the range is large ( or both ), the loop may execute for thousands of iterations.
That means that by the end of the loop, time
has been calculated by the summation of thousands of addition operations.
Numbers that seem "exact" to us in decimal form, such as 0.01 are not exact when the computer stores them in binary, which means that the value used for deltaT
is really an approximation to the exact value.
Therefore each addition step introduces a very small amount of roundoff error, and by the time you add up thousands of these errors, the total error can be significant.
The correct approach is as follows, if you know the minimum and maximum values and the desired change on each iteration:
`int nTimes = ( tMax - tMin ) / deltaT + 1;
for( int i = 0; i < nTimes; i++ ) {
time = tMin + i * deltaT;
}
// NOW use a more accurate time variable
// Or alternatively if you know the minimum, maximum, and number of desired iterations:
double deltaT = ( tMax - tMin ) / ( nTimes - 1 );
for( int i = 0; i < nTimes; i++ ) {
time = tMin + i * deltaT;
// NOW use a more accurate time variable
}
`
In general there are four values that can be used to specify stepping through a range - the low end of the range, the high end of the range, the number of step to take, and the increment to take on each step - and if you know any three of them, then you can calculate the fourth one.
The correct loop should use an integer counter to complete the loop a given number of times, and use the low end of the range and the increment as shown to calculate the floating point loop variable at the beginning of each iteration of the loop.
So why is that better?
The number of times that the loop executes is now controlled by an integer, which does not have any roundoff error upon incrementation, so there is no chance of performing one too many or one too few iterations due to accumulated roundoff.
The time variable is now calculated from a single multiplication and a single addition, which can still introduce some roundoff error, but far less than thousands of additions.
Where does that +1 come from?
The +1 is needed in order to include both endpoints of the range.
Suppose tMax
were 20 and tMin were 10, and deltaT
were 2.
The desired times would be 10, 12, 14, 16, 18, 20, which is a total of 6 time values, not 5. ( Five intervals if you want to look at it that way. )
( 20 - 10 ) / 2
yields 5, so you have to add the extra 1 to get the correct number of times of 6.
Another way of looking at this is that if nTimes is the number of data points in the range, then nTimes - 1
is the number of gaps between the data points.
Example: interpolate.c is a quick-and-dirty example of interpolating floating point numbers in a loop, that was whipped up in 10 minutes in class. It is NOT an example of good code, but it is an example of how a quick little program can be used to test out, play with, or in this case demonstrate a new or unfamiliar concept.
This example interpolates the function f( x ) = x^3
over the range from -1.0
to 4.0
in steps of 0.5
, using three approachs:
Constant - Take the average of the inputs at the endpoints, evaluate f( average input ), and assume the function is constant over the range.
Linear - Evaluate the function at the endpoints, and then use a linear interpolation of the endpoint function values in between.
Non-Linear - Linearly interpolate the function inputs over the range, and at each evaluation point, evaluate the function of the interpolated inputs.