I am aware of this Q: inplace_merge: What causes a complexity of N*log(N) vs. N-1?
but I find answer unsatisfactory since part Im really interested in in A is not clearly explained. More specificaly it is not clear (to me :)) why cant inplace_merge do inplace merge without any additional memory in linear time by just starting from begin and when your current item is greater than the one is second range (middle, end) just do a constant time swap.