Actually, the answer is very much micro-architectural dependent. This is not something defined by the x86 or any other architecture, but rather left for the designers to implement and improve over the various generations.
For a Sandybridge, you can find an interesting description here
The most relevant part is -
The instruction fetch for Sandy Bridge is shown above in Figure 2. Branch predictions are queued slightly ahead of instruction fetch so that the stalls for a taken branch are usually hidden, a feature earlier used in Merom and Nehalem. Predictions occur for 32B of instructions, while instructions are fetched 16B at a time from the L1 instruction cache.
Once the next address is known, Sandy Bridge will probe both the uop cache (which we will discuss in the next page) and the L1 instruction cache. The L1 instruction cache is 32KB with 64B lines, and the associativity has increased to 8-way, meaning that it is virtually indexed and physically tagged. The L1 ITLB is partitioned between threads for small pages, with dedicated large pages per thread. Sandy Bridge added 2 entries for large pages, bringing the total to 128 entries for 4KB pages (for both threads) and 16 fully associative entries for large pages (for each thread).
In other words, and as also shown in the diagram, the branch prediction is the first step of the pipe, and precedes the instruction cache access. Therefore, the cache would hold the addresses "trace" as predicted by the branch predictor. If a certain code snippet is hardly accessed, the predictor will avoid it and it would age out from the I-cache over time. Since the branch predictor should be able to handle function calls, there shouldn't be fundamental difference between your code snippets.
This of course breaks due to alignment issues (the I-cache has 64B lines, you can't have partial data there, so inlined code may actually cause more useless overhead than a function call, although both are bounded), and due to false predictions of course.
It's also possible that other HW-prefetchers are working and may fetch the data to other levels, but it's not something that was officially disclosed (The guides only mention some L2 cache prefetching that may help reduce the latency without thrashing your L1 cache). Also note that Sandy bridge has a uop cache that may add further caching (but also more complexity).