If you want performance improvement, this is the fastest swapping operation, faster than the stl::swap runs on standard C++ compilers.
template<typename T=int>
void swap(T* p, T* q, int size)
{
T* ptmp = new T[size];
memcpy(ptmp, p, sizeof(T)*size);
memcpy(p, q, sizeof(T)*size);
memcpy(q, ptmp, sizeof(T)*size);
delete[] ptmp;
}
You can make it even faster by replacing the call to new with (int*)alloca(sizeof(int)*size) and commenting out the delete. But alloca is kind of limited as it uses function stack. Okay so you would call it like this:
//line 5
swap(A[j], A[i]);
//int t1 = A[j][0];
// ...
//line 18
This is from the documentation of std::swap():
Non-array: Constant: Performs exactly one construction and two assignments (although each of these operations works on its own complexity).
Array: Linear in N: performs a swap operation per element.
since this swap() performs operation on block of memory rather than element by element therefore it is better then std::swap(). I have confirmed the results using AQtime.
for anyone thinking about "space-locality, cache-miss-prone, cache aligment, cache friendly blah blah blah..." here it is for them:
the memcpy implementations are often written with SIMD instructions which makes it possible to shuffle 128 bits at a time. SIMD instructions are assembly instructions that can perform the same operation on each element in a vector up to 16 bytes long. That includes load and store instructions.
For people who are confused, here is how the std::swap() is implemented in utility header file VC 2012 by Microsoft
// TEMPLATE FUNCTION swap
template<class _Ty,
size_t _Size> inline
void swap(_Ty (&_Left)[_Size], _Ty (&_Right)[_Size])
{ // exchange arrays stored at _Left and _Right
if (&_Left != &_Right)
{ // worth swapping, swap ranges
_Ty *_First1 = _Left;
_Ty *_Last1 = _First1 + _Size;
_Ty *_First2 = _Right;
for (; _First1 != _Last1; ++_First1, ++_First2)
_STD iter_swap(_First1, _First2);
}
}