The most general answer is: you need to profile both blocks and see the result empirically.
However I can give you an answer for the majority of modern x86, x64, PPC, and ARM processors with hierarchical caches. On these platforms the top one will be faster because of better data locality: it accesses the memory addresses sequentially, so you will hit data cache more often. Smart x86 and x64 implementations will even notice that you are reading memory sequentially in this fashion, and prefetch the next cache line before you need it. The bottom pattern accesses memory unsequentially across distant addresses, meaning that you are likely to miss cache on every read.
Ulrich Drepper has a nice paper about this. One of his examples in that paper demonstrates exactly how those two blocks of code differ.
As an example of the math here, assume arguendo that you are programming an Intel Corei7 with a 64 byte cache line size, and a 32kb L1 data cache. That means that every time you fetch an address, the processor will also fetch all other data in that 64-byte-aligned block. On that platform, a double is eight bytes, so you fit eight of them per cache line. Thus the top example will, on average, miss on one out of eight iterations: after each miss, the next 56 bytes will be fetched also, thus the next seven double* reads will be in cache.
The bottom example could probably fit 100 lines of data (one for each i
) into cache simultaneously: 100 * 64 = 6400 bytes, well within cache size. But it's also likely that you'll exceed the associative cache, meaning that two lines will map to the same SRAM in the L1, meaning that one will evict the other.