Here is a link to an answer which outlines the TL2 STM algorithm and which provides links to a couple of fairly readable papers on implementing it and how to mechanically transform normal code into STM code:
Stackoverflow: How do you implement Software Transactional Memory?
Clearly it is an active research area and so there are lots of approaches. The one that may work best for a given problem depends on the concurrency and read/write ratio of the problem at hand.
With respect to the TL2 algorithm your question:
... it seems useless to allow two threads to execute within the same
transaction. They will read each other's writes and the computation
within the transaction block will be malformed.
Its about what data is seen (the read-set) or updated (the write-set) during transactions not what code they are executing. The TL2 algorithm allows speculative work to happen with no locks held keeping the write-set private to each thread until commit. At commit time the transaction takes write locks then does a final versions check that the read-set versions still match the master value versions (the public committed values) before it flushes the updates to the master values incrementing their versions. Of course the transaction can see its own private updates before commit and any committed master values but different tx dont see each others unflushed write-set. If some other transaction has in the meanwhile committed and changed the master versions from what was seen in the read-set the tx is aborted. This enforces consistency without holding of locks whilst the application logic is running. The papers at the link above have a lot more details and optimisations.
Also, what happens when two threads enter the same transaction at the
same time? Doesn't this cause a race condition of sorts?
Yes threads race with TL2 when doing different work if they are reading values which the other thread will update. Work can be thrown away due to aborting at the final consistency checks. You should code to retry a number of times when this happens until you get a consistent read which finally leads to a successful commit. The hope is that if you have many cores on your server you will get more total throughput by occasionally throwing the work of a core away than you would have by using course grained locks which block cores. The other hope is that STM can be more straight forward than hand crafting an optimal fine grained locking approach that avoids deadlock. Clearly throughput depends on the actual application logic and collisions.
Since both are attempting to modify the same shared memory, both will
need to be restarted and the issue continue ad infinitum unless the
system steps in and tell one thread to cool it (sort of like a lock =)
The livelock you mention is avoided by taking locks on the master values at commit time. Take all the locks in ascending memory order of the master values memory addresses and you will not get deadlock. One of the two threads will get all the locks it needs first. The other will block. The first thread will then do its read-set consistency checks, flush it's write-set incrementing the master values, then release the locks. The second thread will finally get all it's locks, then do its read-set consistency checks, and abort the transaction due to seeing that thread one has incremented the version of a master value it used in its application logic.
Clearly this commit time locking strategy blocks threads but locks are not held whilst arbitrary application logic is running. This critical section can be aggresively tuned. The papers even make mention of processor instructions which request that a thread is left to run uninterrupted whilst it is holding those locks.