Hardware threads (e.g. Intel Hyperthreading) are a cheaper and slower alternative to having multiple-cores
Software threads are a software abstraction implemented by the (Linux) kernel:
- either the kernel runs one software thread per CPU (or hyperthread)
- or it fakes it with the scheduler by running a process for a bit, then a timer interrupt comes, then it switches to another process, and so on
Key to their implementation is the hardware provided and kernel configured separation between userland and kerneland: What are Ring 0 and Ring 3 in the context of operating systems?
I will now focus on hardware threads, which is the more obscure hardware question, with a focus on Intel's implementation which it calls Hyperthreading.
The Intel Manual Volume 3 System Programming Guide - 325384-056US September 2015 8.7 "INTEL HYPER-THREADING TECHNOLOGY ARCHITECTURE" describes HT briefly. It contains the following diagram:

TODO it is slower by how much percent in average in real applications?
Hyperthreading is possible because modern single CPUs cores already execute multiple instructions at once with the instruction pipeline https://en.wikipedia.org/wiki/Instruction_pipelining
The instruction pipeline is a separation of functions inside of a single core to ensure that each part of the circuit is used at any given time: reading memory, decoding instructions, executing instructions, etc.
Hyperthreading separates functions further by using:
a single backend, which actually runs the instructions with its pipeline.
Dual core has two backends, which explains the greater cost and performance.
two front-ends, which take two streams of instructions and order them in a way to maximize pipelining usage of the single backend by avoiding hazards.
Dual core would also have 2 front-ends, one for each backend.
There are edge cases where instruction reordering produces no benefit, making hyperthreading useless. But it produces a significant improvement in average.
Two hyperthreads in a single core share further cache levels (TODO how many? L1?) than two different cores, which share only L3, see:
The interface that each hyperthread exposes to the operating system is similar to that of an actual core, and both can be controlled separately. Thus cat /proc/cpuinfo
shows me 4 processors, even though I only have 2 cores with 2 hyperthreads each.
Operating systems can however take advantage of knowing which hyperthreads are on the same core to run multiple threads of a given program on a single core, which might improve cache usage.
This LinusTechTips video contains a light-hearted non-technical explanation: https://www.youtube.com/watch?v=wnS50lJicXc