The exact answer depends on the machine-specific details of the cache configuration. The only way to know for sure in general is to measure using the hardware counters and a tool like PAPI.
However in general, writes from a core will update a copy in the L1 cache, so that a subsequent read of the same address later will return the updated copy from cache without a miss (assuming the cache line hasn't been evicted in the interval).
For the code you show (1-d array with 16 4-byte elements), you're only dealing with 64 bytes which is 1 cache line on most modern processors (or 2 depending on alignment), so it's very likely to be loaded into L1 cache at the start when you initialize the elements, and operate in-cache for both loops (assuming there are no other conflicting accesses from other threads).