Looking at Arm as an example, it has hardware support for automatic cache invalidation as explained in this URL: https://developer.arm.com/documentation/den0024/a/Multi-core-processors/Multi-core-cache-coherency-within-a-cluster
It also has software instructions to do the same manually, such as DC
and SYS
.
My question is, why and when would you need to ever run such instructions if its already covered automatically by hardware?
This question applies to any other architecture which supports both SW and HW cache invalidation.