How do forward declarations speed up compilations since at some point the object referenced will need to be compiled?
1) reduced disk i/o (fewer files to open, fewer times)
2) reduced memory/cpu usage
most translations need only a name. if you use/allocate the object, you'll need its declaration.
this is probably where it will click for you: each file you compile compiles what is visible in its translation.
a poorly maintained system will end up including a ton of stuff it does not need - then this gets compiled for every file it sees. by using forwards where possible, you can bypass that, and significantly reduce the number of times a public interface (and all of its included dependencies) must be compiled.
that is to say: the content of the header won't be compiled once. it will be compiled over and over. everything in this translation must be parsed, checked that it's a valid program, checked for warnings, optimized, etc. many, many times.
including lazily only adds significant disk/cpu/memory increase, which turns into intolerable build times for you, while introducing significant dependencies (in non-trivial projects).
I can buy the argument for reduced complexity, but what would a practical example of this be?
unnecessary includes introduce dependencies as side effects. when you edit an include (necessary or not), then every file which includes it must be recompiled (not trivial when hundreds of thousands of files must be unnecessarily opened and compiled).
Lakos wrote a good book which covers this in detail:
http://www.amazon.com/Large-Scale-Software-Design-John-Lakos/dp/0201633620/ref=sr_1_1?ie=UTF8&s=books&qid=1304529571&sr=8-1