How does a virtually-indexed physically-tagged cache solve the problem of homonyms?

Question

Homonyms, when talking about virtual caches, are when one virtual address corresponds to several physical addresses.

One known solution for dealing with the problem of homonyms in virtual cache, is to use physical tags for the cache; that is: index the cache with part of the virtual address, but make the tag part of the physical address. I don't understand how this works.

In my understanding, the processor is sending a virtual address to the cache. How does the processor know which physical tag to check for if it is not sending any physical tag to the cache? For example, with a normal physical cache, if you're looking to see if 00001111 exists in the cache, where the 0000 are the tag bits and 1111 are the index bits, you'll index the cache by 1111, and if you get a hit, see if the tag for index 1111 is 0000.

However, if your virtual address is 00001111, which corresponds to a physical address of, let's say, 10101010, the processor will send 00001111 to the virtual cache. How will the cache know which physical tag bits to check (1010) if those tag aren't present in the virtual address sent by the processor? Where does the physical address come from?

In simple terms, let's say VA 1 maps to 3 physical addresses: 2, 3, and 4. If my processor queries the cache with virtual address 1, how does the cache even know if it's supposed to be looking for 2, 3, and 4? Isn't it ambiguous which physical address I'm trying to find?

You are going to have to explain in more detail. You are using academic gibberish that textbook writers put into books to confuse students. Plus, it is not clear that this is a programming question as opposed to a hardware design question. — user3344003, Mar 05 '19 at 03:09

score 0 · Answer 1 · answered Mar 04 '19 at 07:05

The term "virtually indexed" just means that the index does not require any translation. It happens if the index in entirely included in the page_offset part of virtual to physical address translation.

Assume you have 4kb pages. It means that the 12 LSB of addresses do not need any kind of translation.

Now assume you have a 32kb cache (2^15), with 64b (2^6) lines and an associativity of 8. The number of lines is 2^9 and the number of sets 2^6. Your index is thus 6 bits. As the lines are 64b, the line offset is 6b, and the part (index+offset) is 12b. You can notice that it is equal to the page offset size and requires no translation. This is what is called a virtually indexed cache.

So in a virtually indexed cache, you can perform in parallel address translation of the MSB of the address and tag extraction from the cache with the LSB of the untranslated address. And once address is translated you compare your physical tag to the translated address, without any risk of homonym.

Somehow the term virtually indexed is misleading as it can only apply when physical and virtual indexes are identical.

zg c · Answer 2 · 2023-08-26T10:41:52.610

Based on this paper referenced in OSTEP (You can view more with "2.3 Cache indexing and tagging" and "2.2.2 Homonyms" based on my explanation)

the cache coherence implies that physical tag acquired by same virtual address when using "virtually indexed" cache are all same for different address spaces. So when "physical tag" concatenated with the offset in the virtual address, the physical address will also be same, i.e. no Homonym.
With "virtually indexed, virtually tagged caches", with the same virtual address, it skips the comparison between the PFN and Tag, so any physical tag is allowed, which causes the different physical addresses, i.e. the Homonym.

Something about the Synonym which is related with the Homonym (You can ignore them if not interested)

If you want to know why "Virtually-indexed" cache has Synonyms, you can refer to this SO and this geeksforgeeks guide "Example demonstrating Aliasing:" based on the above paper "Figure 6" where multiple cache lines correspond to the same physical address.

Also, "Physically-indexed" caches will avoid the "synonyms" because based on the above paper Figure 5(a), they use the page frame numbers (PFN) to select the unique location in the cache to store the data corresponding to the physical address. Although here different VPNs can be mapped to the same PFN (This can be seen in the geeksforgeeks guide two different mmap).

mmap(virtual_addr_A,4096,file_descriptor,offset)
mmap(virtual_addr_B,4096,file_descriptor,offset)

How does a virtually-indexed physically-tagged cache solve the problem of homonyms?

2 Answers2

Something about the Synonym which is related with the Homonym (You can ignore them if not interested)