Using any language that provides exact arithmetic for (unbounded) large integer and fraction, we can try to evaluate the rounding error easily with very naive code, at least for small n (exact arithmetic then might become prohibitive)
I'll arbitrarily choose Squeak Smalltalk here to illustrate this.
First, let's define a closure that will return the exact partial sum
hn := [:n | (1 to: n) detectSum: [:i | i reciprocal]].
Then another one that will return the Float (double precision) by iteration, starting with smaller terms:
fn := [:n | (n to: 1 by: -1) detectSum: [:i | i asFloat reciprocal]].
And now another closure that will estimate the error in term of ulp:
un := [:n | | f | ((f := fn value: n) asFraction - (hn value: n)) abs / f ulp].
Now, let's search for the worst case for first 1000 partial sums:
i := (1 to: 1000) detectMax: [:n | un value: n ].
{i. un value: i}.
The result is #(807 6.101840327204507)
, about 6 ulp error for H807. That's not much as you can see.
If we try to sum with bigger harmonic first:
fn := [:n | (1 to: n) detectSum: [:i | i asFloat reciprocal]].
Then the result is #(471 5.714808765246509)
, 5.7 ulp, surprise a bit less.
We can try to look a bit further with the later fn:
un value: 10000.
-> 1.8242087614993667
Using memoization, I find that max error is about 54 ulp for 100,000 first terms (H ~ 7) and about 26 ulp if summing more carefully starting with smaller terms.
So up to a few thousands terms, no use to start summation with smaller terms (which enable memoization if you're going to scan increasing), and it's not necessary to use Kahan summation either, unless you really care about last few ulp.
Kahan will become interesting for millions to billions terms, but for bigger sum, formulas are going to be so much faster and also more accurate if you have good approximation of Euler Mascheroni, natural logarithm will never be more than a few ulp off. But you should show your code, otherwise this is only speculation.
For example, this naive second order formulation
fn := [:n | n ln + (0.5772156649015328606 + ((n*2.0) reciprocal - (n squared*12.0) reciprocal))].
will give about 5.5 ulp error for small values of n, but less than 1.5 ulp above 100,000 with decent MacOs libm.