Referring to question this already asked/answered question: (How are the fs/gs registers used in Linux AMD64?), and this doc referenced in an answer to this question (https://akkadia.org/drepper/tls.pdf)
According to the doc the FS register points to the TCB(Thread control block), which points to the DTV (dynamic thread vector) which ultimately leads to the thread local data. Is it then right to assume we can incur up to 3 cache misses loading a thread local variable? (1 for TCB, 1 for DTV, and 1 for the data itself?