In Thumb, "32-bit" instructions are still composed of two separate halfwords, so the "first halfword" is the first halfword of the encoding, which says nothing about the layout in memory. Thumb instructions are halfword-aligned, so any given word of memory could hold two 16-bit instructions, a 16-bit instruction and one half of a 32-bit instruction, two halves of two different 32-bit instructions, or one whole 32-bit instruction.
Conceptually, the processor decodes one halfword at a time, thus if it sees one of the above bit patterns, it knows it needs to also decode the next halfword before it can actually execute this instruction. Reality complicates this somewhat since the Cortex-M3/M4 only ever actually fetch whole 32-bit words from memory, so the correlation between the number of "instruction fetches" and the number of instructions actually decoded and executed depends on the code itself. Just imagine that those fetches are to refill a 4-byte buffer that the pipeline slurps individual halfwords out of (which may not be all that far off the truth, for all I know).
So, if you have a halfword containing one of those values in its top bits, then you know it's the first half of a 32-bit encoding, and you need to interpret it in conjunction with the next halfword. Conversely, if you have a halfword with any other value in its top bits, then it's either a 16-bit encoding, or the second half of a 32-bit encoding, depending on what the previous halfword was.
Note that instructions are always little-endian, so the actual in-memory layout of a 32-bit encoding looks like this, where address A is an even number:
--------------------------------
address A | bits 7:0 of first halfword |
--------------------------------
A+1 | bits 15:8 of first halfword |
--------------------------------
A+2 | bits 7:0 of second halfword |
--------------------------------
A+3 | bits 15:8 of second halfword |
--------------------------------