Merge sort indeed needs some working space:
- In naive implementations, each recursive call allocates 2 subarrays to copy the left and right parts as their elements may be overwritten during the merge phase. This working space amounts to the size of the dataset for the last merge phase.
- In more advanced implementations, a single working array is allocated in an initial phase and is passed to the recursive calls, or used in the iteration for non recursive implementations. Depending on the implementation details, the size of this working array can be reduced to half the size of the dataset plus 1, or the next power of 2 for bottom-up iterative implementations.
- If the data elements are large, such as structures with multiple fields, there is an alternative approach to require less working space: allocate an array of pointers to the dataset, sort this array and use the resulting array to shuffle the original dataset in place. This may require less space but the shuffle phase is tricky.
All of the above require some working space proportional to N, hence a space complexity of O(N), which is much worse than insertion sort, heap sort, shell sort and quick sort.
Note that merge sort applied to lists does not require O(N) space, but rather O(log N) space, either for recursive calls in top-down implementations or in an array of pointers to sublists for iterative bottom-up implementations. However extra O(N) space is already present in the data structures to store the next
pointers.