If you call a dynamically-linked library, you may get different code on different processors. (For example, the Accelerate library on Mac OS X uses different implementations of its routines on different processors.)
However, if you use identical executable images (including all libraries) that do not dispatch based on processor model and have identical inputs (including any changes made to floating-point modes or other global state that can affect floating-point), then the processor produces identical results for all elementary floating-point arithmetic (add, subtract, multiply, divide, compare, convert).
Certain operations might not be fully specified to return identical results on different processors, such as the inverse-square-root-estimate instruction.
Concerns mentioned in ecatmur’s answer about optimizations made by the compiler, fused multiply-add, and SSE/SSE2/FPU use, do not apply to identical binaries. Those concerns apply only when different compilations (different switches, different target platforms, different compiler versions) might produce different code. Since you have excluded different compilations, these concerns are not relevant.
If you build for both a 32-bit target (i386) and a 64-bit target (x86_64) you are making two executable images (in one “fat” file), and the concerns about different compiler products apply.