This page contains a lot of information about all intrinsics:
Intel Intrinsics Guide
The series of instructions that will write data to memory, avoiding cache evictions are generally named _mm_stream_...
. As the name implies, these are ideal for applications that write a large stream of data that is basically contiguous in memory and unlikely to be accessed again in the near future. So, for example, if you are mixing audio buffers and producing a single waveform output this would work well.
One of the keys to using these instructions effectively is taking advantage of write combining. If your write locations are scattered throughout memory, these instructions will stall as badly, or possibly worse than any other kind of memory storage instruction you attempt. Since these writes do not wind up in cache, if you're not filling an entire write buffer then essentially your operation becomes a write-through operation, requiring a stall until the write is completed. If you are writing contiguous memory locations then write combining will apply, and make your data writes much more efficient.
The flip side of that coin is prefetching. Prefetching tells the system to start pulling a memory address into the desired level of cache so that by the time the memory read is complete, you are ready to use the data. This is much harder to use, and requires an appropriate data "stride" which takes into account the cache sizes, cache line size, and the number of instructions which can execute before the memory read completes. Using the hinting parameter, you can "suggest" that the data goes into the L1, L2, or L3 cache, or that it is "non-temporal", meaning that you're just going to use it once and it should be evicted first before any other cache evictions. The hardware has its own prefetching heuristics that work well for most problems without explicit prefetching instructions, but the classic counter-example is a matrix transpose:
Prefetching examples
Prefetching is generally very difficult to use effectively except in some very specific cases like this. Without a more specific problem statement from you, this is about all I can provide.