As an above poster mentioned, making an object with make_shared makes the "control block" adjacent to the object referred to.
In your case however I believe this to be a poor choice.
When you allocate memory, even in a big block, you have no guarantee to get contiguous "physical space" as opposed to sparse, fragmented page allocations. For this reason, iterating through your list would cause reads across large spans of memory just to get the control structures (which then point to the data).
"But my cache lines are 64 bytes long!" you say. If this is true, you might think, "this will mean that the object is loaded into cache along with the control structure," but that is not necessarily true. That depends on many things such as data alignment, cache line size, associativity of the cache, and the actual memory bandwidth you use.
The problem you run into is the fact that first the control structure needs to be fetched to figure out where the data is, when instead, that could be residing already in cache, so part of your data (the control structure) could at least be practically guaranteed to be in cache if you allocate them all together instead of with make_shared.
If you want to make your data cache-friendly, you want to make sure that all the references to it fit inside the highest-level cache possible. Continuing to use it will help to make sure it stays in cache. The cache algorithms are sophisticated enough to handle fetching your data unless your code is very branch-heavy. This is the other part of making your data "cache friendly:" use as few branches as possible when working on it.
Also, when working on it, try to break it up into pieces that fit in cache. Only operate on 32k of it at a time if possible - that is a conservative number on modern processors. If you know exactly which CPU you will be running your code on, you can optimize it less conservatively, if you need to.
EDIT: I forgot to mention a pertinent detail. The most frequent allocated page size is 4k. Caches are often "associative," especially in lower-end processors. 2-way associative means that each location in memory can only be mapped to every other cache entry; 4-way associative means it can be fit into any of 4 possible mappings, 8-way means any of 8 possible mappings etc. The higher the associativity the better for you. The fastest cache (L1) on a processor tends to be the least associative since it requires less control logic; having contiguous blocks of data to reference (such as contiguous control structures) is a good thing. Fully associative cache is desirable.