Some measurements in a chart, from example worked in the Pitfalls Presentation, showing effect of standard OO, locality with contiguous storage and then additional improvements reorganising data structures flattening them further and prefetching - http://seven-degrees-of-freedom.blogspot.co.uk/2009/12/pitfalls-of-object-oriented-programming.html?view=flipcard
Pitfalls of OO Programming presentation, that is written for games programmers which I found extremely clear, explains the issue with traditional objects using many pointers, on modern systems "Pitfalls of OO" - http://research.scee.net/files/presentations/gcapaustralia09/Pitfalls_of_Object_Oriented_Programming_GCAP_09.pdf
Furthermore it's not just about cache prefetch & locality on a server/desktop system, but also the possibility of page faults (may be to/from disk) which can make scattered fragmented object allocations very inefficient. Secondly, you may be wanting to use multiple cores, the problem with updating objects, is access contention and locking as each object has no way of knowing other accesses to object are "safe".
So to increase locality there are gains for packing objects, as tight as possible and having blocks of them on same page in memory (especially for small items which overlap on cachelines), which is why in C, preallocating arrays of nmemb structures with void *calloc(size_t nmemb, size_t size)
can be much more efficient than one at time with malloc void *malloc(size_t size);
Now suppose most of the time, you do computations repeated on large number of objects, like summing a particular value in the object, or transform it in some way updating certain fields, then you'ld like to organise it to pack the data together, so as much as possible is on one cache line, and you aren't loading from RAM (or disk) all the stuff you rarely need or use, like may be identification strings, link pointers or other values you don't expect to need for a while. These fields can actually be independant, so a summation could occur simultaneously with a transform of different fields. Or processing of a large list of items can be farmed out across CPUs by simply splitting it in half or quarter, without contention for a lock required to safely read, or update every object. You know the data you're interested isn't contended for.
In that case, you'ld rather have parallel arrays, which you can scan sequentially with predictable prefetching that the CPU can detect and anticipate so initiates loads automatically. There's no pointer chains, the order of items is implicit so you're maximising data density by avoiding wasteful 4/8 byte pointers. And you can reduce lock contention to fields which actually require simultaneous updates from more than 1 thread. To see why it matters Herb Sutters Machine Architecture talk is interesting (but long video) - http://www.youtube.com/watch?feature=player_detailpage&v=L7zSU9HI-6I#t=897
When you start thinking like that you are considering Data Oriented design - What is data oriented design?
It can help performance massively if it suits your problem, and is used by Games Programmers -
http://gamedevelopment.tutsplus.com/articles/what-is-data-oriented-game-engine-design--cms-21052
There's another previous answer here which has more detailed explanation and links to some very detailed papers explaining modern memory issues - What is "cache-friendly" code?