Excuse me answering a two year old question, especially with something unique to my own language Felix http://felix-lang.org but here goes anyhow :)
In Felix, functions and procedures are fundamental different, and it isn't just that procedures have side effects and are called in statements, whereas functions don't have side effects and are used in expressions (because Felix also has generators which are functions with side-effects .. :)
No, the execution model is fundamentally different, primarily for performance reasons, but not entirely. The model is:
- Functions put their return address on the machine stack, and the return value too.
- Procedures use a linked list on the heap. Procedural code is flat, it does not use the machine stack.
This is typically inefficient, so why do it? The answer is: Felix procedure are all potentially co-routines (fibres). They can switch control to another procedure by accessing a channel. This causes an exchange of control.
- For performance reasons copying the machine stack on control exchange is not an option.
- For memory management reasons swapping stack pointers is not an option either.
The OS typically swaps stack pointers for threads, which is reasonably fast, but has a fundamental problem on linear address machines: you either have to limit the maximum size of the stack to a ridiculously small value, or limit the number of threads to a ridiculously small value. On a 32 bit machine, there is not enough address space to even contemplate this solution. On a 64 bit machine, stack swapping has more potential, but of course user demands always grow to outstrip hardware 3 days after it is released .. :)
Felix just swaps a single pointer to the heap-based stacks, so context switches are blindingly fast and very little address space is wasted. Of course the cost is heap allocations on procedure calls.
In the compiler, a lot of the architecture of the theoretical model is optimised away on an "as-if" basis, so actual performance and implementation can be quite different to the theoretical model, provided the compiler can prove that you can't tell the difference .. other than being denied the opportunity to make a cup of coffee with leisure :)
So here, you have a different answer as to why functions and procedures might be treated differently.