There are basically two strategies:
- Static move ordering
- Dynamic move ordering
Dynamic move ordering uses information from previous searches, either because you transpose into the same position again, or you have already reached the position in a previous less thorough search. That is the idea of iterative deepening that you mentioned, which continuously increases the search distance.
Dynamic move ordering is very powerful. There are many ways to do it but the two most common ones are transposition tables and killer moves:
Transposition tables cache information about previous searches, especially the best move that was found. When the same position is reached again, you can immediately search the best move from the previous search. Very often, it be confirmed as the best move by the deeper search.
Killer moves use a similar approach and have the additional advantage that they can use knowledge from similar but not identical positions. However, the quality of the killer moves for move ordering is generally worse than for moves from the transposition tables. That is why they are generally searched after the transposition move.
But what to do if there are no information from previous searches? Often you have some domain specific knowledge that you can use for static move ordering. For example, in chess there are many rules of thumb. One is that capture moves are more likely to be the best move than non-captures. There are more sophisticated strategies (e.g., static recapture analysis) but you have to be careful as more complicated computations will also slow down the search.
By combining both static and dynamic move ordering, chess engines can often guess the best move in the position with hit rates of more than 90%.