Have you heard of "caller-save" and "callee-save" registers?
Since your CPU only has a small, finite number of registers, it's usually impossible for caller/called functions to always use different registers. If a caller function and called function both want to use the same register, it means the value in the caller will have to be saved/restored before/after the call.
Saving/restoring register values can be done either by the caller or by the callee -- which one does so is a matter of convention. The benefit of "caller-save" registers is that if the caller knows it won't need the value in register XYZ after the call, it can omit the save/restore operations. The benefit of "callee-save" registers is that if the callee knows it won't modify the value in register XYZ, it can omit the save/restore operations.
I'm guessing that your compiler treats RDI as a callee-save register, but doesn't omit the unnecessary save/restore operations unless you have compiler optimizations turned on. (If someone knows this is incorrect, please post another answer!)
UPDATE: I found an article on x86 calling conventions: http://en.wikipedia.org/wiki/X86_calling_conventions
It seems to confirm that with most calling conventions, RDI would be callee-save. This doesn't explain why it isn't pushing and popping all the other callee-save registers. Maybe there is something else going on here.