ARM 32bit
ARM is a 32bit instruction set. All opcodes are 32bits. The leading bits denote conditional execution. This is generally wasteful as 90+% of code executes unconditionally. The ARM mode supports 16 registers nearly symmetric (with some special cases for PC, LR and SP).
Most instruction included an 's' suffix to set condition codes.
Thumb 16bit
The original thumb is 16bit only opcodes. It does not support conditional execution and access was mainly restricted to the lower eight registers. All arithmetic instructions set condition codes. Some instructions could retrieve data from the higher registers. It can be looked at as a compression engine on the instruction decode.
For some algorithms and memory topology, thumb can be faster than ARM. However it is fairly rare and needs slow (non-zero wait state) instruction memory for this to be the case.
As a practical example, some 'Game boy advance' code would be mainly execute in thumb mode, but would jump to zero wait state RAM and transition to ARM mode for a performance critical routine.
Thumb2 mixed mode
Thumb2 extended the thumb ISA but allows for both 16bit and 32bit opcodes. Almost the entire original ARM instruction set functionality can be achieved with Thumb2. Since the instruction stream is more dense, it is higher performance than the original ARM in almost every case due to lower instruction fetch overhead.
Thumb2 allows conditional execution for four instructions with 'if/else' opcode conditions. It allows use of all 16 registers and .unified
code can be written to produce either ARM 32bit or mixed Thumb2 code.
Unified code will always be faster when Thumb2 is selected. There are fairly rare ARM sequences that can not be encoded directly to Thumb2. These few cases snippets could be faster. But generally, for any large code base, Thumb2 is faster.
This mode can be confusing with loop unrolling and jump tables. It is something that an x86 programmer would naturally think of. Ie, there are '.n'/narrow/16bit and '.w'/wide/32bit encodings of identical instructions. So if you treat code as an 'array' of tasks, the computations can be more complex. You also have transfer of control to mid-instruction possibilities.
As an example of 'un-encodeable' Thumb2 ARM code,
movlo r0,#1
moveq r0,#0
movhi r0,#-1
Above is only possible in ARM mode. However, such sequences are very rare and would only matter if you are porting assembler code from ARM to Thumb2. If it is selecting a compiler mode, Thumb2 should always produce better code (faster and smaller).
Summary
Each mode has variations on available opcodes depending on CPU model. However, the general concepts of each mode and performance are as stated.