The thumb encoding of the pc relative ldr instruction was just covered recently here on SO. when you look at the documentation on the instruction set you will as we have pointed out know that the PC from a documentation perspective, is two ahead in the early pre-thumb2 days, but now for thumb it is 4 bytes ahead of the instruction address. The pc offset is encoded in units of words so the address being used is
((instruction address + 4 ) & 0xFFFFFFFC) + (immed<<2)
removing all confusion about the two ahead thing.
The reality is there are multiple program counters, the days of a single program counter used to actual fetch things and do pc relative addressing are a part of history in older, simpler, architectures.
This two ahead thing is part of that past, but for compatibility reasons, has carried on from acorn to the present arm products, just like x86 and others have legacy things that no longer are what they say they are (branch shadow/defer slot in mips).
The pipe is different and one would assume for every different arm product (not architecture but product cortex-m0, cortex-m4, cortex-a7, etc) the pipe implementation and how the core keeps track of things varies. The two ahead is synthesized by some form of a program counter keeping track of the instructions in the pipe. Likewise the fetch/prefetch/branch prediction are all forms of a program counter, but not assumed to be a single program counter. r15 itself is also either real from the register file or fake or both (I would expect not in the register file, why burn those cycles for no value add).
Just like in software you could have a reg[15] array item, a pc_fetch, a pc_current_inst, pc_execution, a pc_possible_branch, a pc_branch_prediction set of variables to keep track of a simulation of a processor, the logic can too. And which one is used at what time depends on what you are doing. What we thing of as programmers as the PC as described in the operation of an instruction is an address that is "two ahead" of the address where the instruction lives. with thumb2 the two ahead no longer makes sense, so for thumb mode it is 4 bytes ahead for arm mode 8 bytes ahead of the instruction address. And then you follow the documentation to understand how that PC is used during execution of the instruction.
For BX and other instructions capable of mode switching the definition of that address which becomes the "program counter" is different, the lsbit drives the mode to switch into (and is stripped off by the branch it does not live in the program counter, there is a psr bit to take care of that). These addresses are also a sort of form of program counter as well that temporarily at least is the actual address of the instruction to branch to and not two ahead.
In a lot of early processor implementations where you had one or the idea of one program counter, you fetched, decoded and executed one instruction at a time before going on to the next (does not mean people no longer do those designs, you can make small and efficient little controllers the old fashioned way and people still do and the are in products we use). In that case the pc is used to fetch the instruction, which may be more than one byte, once the instruction is completely fetched then the program counter points to, at least for the moment, the next instruction. The execution of that instruction can now begin since fetch and decode have completed. If the program counter is used as an input to that instruction then it is pointing to the next instruction, if used as a destination in a jump or branch then it is modified and after completion the next fetch happens wherever it happens to be pointing. Many of these architectures were variable length instruction sets so, one instruction may be one, two, three... bytes long so the pc address relative to the instruction address at execution time, varied. The early arm comes from a pipeline type solution with fixed sized instructions, so if you had a single program counter, then, depending on the pipe design, if you use a textbook style one, then execution is at a fixed depth in the pipe meaning the program counter is fetching that many ahead when you execute.