Kernighan's algorithm works best on data which fits in the ALU, often 64-bits on modern hardware. For longer numbers, the algorithm becomes quadratic in the length of the number because each iteration does a computation over the entire length of the number. The arithmetic can be hand-optimised because it's clear that once the borrow stops propagating, nothing will change as a result of the bitwise and.
Even with that optimisation, we're still in Shlemiel the Painter territory; the computation is quadratic becsuse the scan effectively always starts at the same place on every iteration, scanning further and further each time.
Anyway, it would be very surprising to see even a sophisticated optimiser find that optimisation and Python's bignum implementation does not have a sophisticated optimiser. It does not, for example, fuse the decrement with the bitwise and.
Bitwise and on bignums could obviously be done in place, so it would be tempting to write:
number &= number - 1
in the hopes that an in-place operation would be performed. But it won't make any difference with CPython; CPython's bignums are immutable so in-place mutation is not possible.
In short, Python is going to create two new million-bit numbers for every iteration. It's not terribly surprising that it takes a while. Rather, it is surprising that it only takes 25 seconds.
Depending on version and platform, CPython bignum operations are performed in 15- or 30-bit units. (Computation on integers which fit into 30 bits is slightly ootimised. But 64 bits is well outside of that range, so don't expect limiting numbers to 64 bits to avoid bignum overhead.) Assuming your platform uses 30-bit units, running the algorithm for the expected half-million iterations means performing more than 66 billion single-word computations (a subtract and a bitwise on (1000000/30 = 33334)-word bignums), plus a million 130KB memory allocations. Doing that in 25 seconds is not shabby at all. It's a testimony to how fast modern CPUs are, and a warning about how easy it is to not notice that you're using a quadratic algorithm until the numbers get really large.