According to https://software.intel.com/sites/landingpage/IntrinsicsGuide
prefetcht0, prefetcht1, prefetcht2 and prefetchnta
Fetch the line of data from memory that contains address p to a location in the cache heirarchy specified by the locality hint i.
I'm sure a "line of data" is obvious to someone familiar with the context but to me its a mystery. If I provide a pointer to some data to prefetch, what is the amount that will be fetched? 4B? 64B? 1KB?
If I intend to read 32B from that address later and it prefetches only 16B, should I prefetch multiple times with offsets?