I think this is a valid question, although not too well phrased. A problem is that even the question referred to by furins was not phrased well and was closed before receiving a good answer.
The question itself however is interesting especially for the closed one which shows that it was even included in a book, so it could lead more people in one or another direction.
I think neither algorithm is particularly better. In the naive average it looks like we will lose precision, or that when averaging numbers with several magnitude of differences we even lose numbers, but the same may probably be discovered with the other algorithm as well, maybe just with different sets of input data.
So, especially for that it is from an existing book, I think this is a perfectly valid question seeking for some decent answer.
I attempt to try to cover up what I think about the two algorithm by an example. So imagine you have 4 numbers of roughly the same magnitude and you want to average them.
The naive method sums them up first, one after another. After summing the first two you obviously lost one bit of precision at the low end (since you now probably have one larger exponent). When you add the last number, you have 2 bits lost (which bits are now used for representing the high part of the sum). But then you divide by four which in this case essentially is just subtracting 2 from your exponent.
What did we lose during this process? Now it's easier to answer what if all the numbers were truncated by 2 bits first. This case obviously the last two bits of the fraction of the resulting average will be zero, and up to additional 2 bits worth of error may be introduced (if all the truncated bits were happened to be ones in the original numbers compared to if they were zeros). So essentially if the sources were single precision floats with 23 bits of fraction, the resulting avg would now have about 19 bits worth of precision.
The actual result from the naive method is better though as the first numbers summed didn't lose that much precision.
In the differential method in each iteration the appropriately weighted difference is added to the sum. If the numbers are of the same magnitude, then this difference will most likely be somewhere one magnitude below. It is then divided by the current count, in this operation nothing lost, but the resulting difference for the last number (with i=4 in this example) may be about 3 magnitudes below than the source numbers. We add this to the running average which is around the same magnitude as the original numbers.
So with the differential method in this example adding the last number seems to have lost about 3 bits of precision, for all the 4 numbers it may even look like we may be down to 5 essentially lost bits of precision - maybe even worse than the naive method?
The differential method is harder to follow, maybe I did some mistake in my assumptions. But I think it's clear: It just doesn't seem valid to regard one or another performing better, or if so, maybe it depends on the layout and magnitude differences of the data.