2

I have my initial array ( random or duplicates allowed) which needs to be sorted first and then needs to be unsorted the sorted array back to the original array with the order preserved.

Ex:
    orig_array: char[512] = { 1, 2, 4, 1,3,5,a,b,c,........}

    Sorted array: char[512] = { 1, 1, 2,2,3.....)  ( using sorting algorithms )

    Unsorting back to original order: { 1, 2,4,1,3,......}

I am stuck on the way on how to implement unsorting back to the original order in which they are created.

Some background on why I'm asking:

Unsorting the array is done completely on a different system. The sorted array is sent out to the network and is received by the host. The original data must be preserved.

I have a compression module on both ends to compress and decompress data. Compression module is CPU expensive. My idea was compression on data sequences sorted shouldn't consume much CPU and should be pretty quick to encrypt and decrypt on both ends which decreases the overall latency

dbush
  • 205,898
  • 23
  • 218
  • 273
vip
  • 53
  • 1
  • 6
  • 9
    sort a copy of the array – r3mainer Jun 29 '17 at 20:41
  • 1
    A funny question. – Eugene Sh. Jun 29 '17 at 20:49
  • A fun way to do it would be, during the sorting process, keep track of every element move/swap you make. Then when you're done, follow the process backwards. :) – lurker Jun 29 '17 at 21:03
  • Thanks. The issue here is unsorting the array is done completely on a different system . Sorted array is sent out to the network and is received by the host.The original data must be preserved – vip Jun 29 '17 at 21:14
  • 3
    It is getting even funnier :) Are you trying to encrypt the network traffic? It looks like a crazy XY-problem. – Eugene Sh. Jun 29 '17 at 21:16
  • i see practical application is not your focus here. – meowgoesthedog Jun 29 '17 at 21:59
  • Why don't you just send the data unencrypted? What are you **really** trying to do? – dbush Jun 29 '17 at 22:19
  • I have a compression module on both ends to compress and decompress data.Compression module is CPU expensive. My idea was compression on data sequences sorted shouldn't consume much CPU and should be pretty quick to encrypt and decrypt on both ends which decreases the overall latency – vip Jun 29 '17 at 22:48
  • 2
    Measure it. The chances of the speed of compression being significantly improved are small, and the cost of undoing the sort will be large unless the data sets are small. Compress the unsorted data and send that, and you save the time for sorting and unsorting. – Jonathan Leffler Jun 29 '17 at 23:23

4 Answers4

4

If you sort "in-place" - that is, reorder elements in the actual array, you cannot unsort. Sorry.

You could, instead, sort "out-of-place", or sort an array of pointers to elements in the array (using the array data for comparison) - keeping the original array as-is.

See also: In-place algorithms on Wikipedia

einpoklum
  • 118,144
  • 57
  • 340
  • 684
  • 3
    You can provide for reversing an in-place sort by tracking the sort permutation. Of course, that requires a sort algorithm that will cooperate. For example, in addition to the data to be sorted, you also create a parallel array of indices, initially in order from 0 through *n*. Every time the sort moves an element of the data, it performs a corresponding move in the array of indices. Since the final array of indices tracks where each element went, you can use it to reverse the sort. – John Bollinger Jun 29 '17 at 21:02
  • @JohnBollinger: Fair enough and +1, but it doesn't look like OP was tracking the sort permutation. – einpoklum Jun 29 '17 at 21:28
3

You cannot reverse a sort without retaining additional data. A trivial way to do that is to retain a copy of the data in their original order, for instance by performing an out-of-place sort. That's fast and simple, but it's not the only alternative.

Another way you could provide for reversing the process (per @lurker) is to make a list of the changes (swaps / permutations) performed on the array, in the order they are performed, and to reverse by going backward through that list and reversing each individual step. That's fairly simple, and approximately as fast as the original sort, but it requires more memory in the average-case asymptotic limit than just keeping a copy does.

Yet another way is to track the element permutation performed by the sort. You can do this by creating an auxiliary array of element indices, one per element to be sorted, initially in order. As the sort proceeds, it mirrors each element move with a corresponding move in the auxiliary array. When the sort is finished, the auxiliary array tells you what the original index of each element was, and you can use that information to return the elements to the original order. This is no less memory efficient than keeping a copy of the original data, and the order restoration can be performed in linear time.

Alternatively, you can avoid sorting the original data at all. If you instead sort an array of pointers to the data or, equivalently, indices of the data then you can find the sorted order without losing the original order in the first place.

John Bollinger
  • 160,171
  • 8
  • 81
  • 157
  • ahh i am reviving this question after a long time. , anyways , i tried ur approach of keeping a track of every sort ... but this works only for small data ...i want something that can work for large data that is a array having 8,00,000+ elements :) – Daksh Rawal Mar 07 '23 at 06:47
  • @daksh, if you want to be able to restore the original order of an array after sorting it, then you need to have sufficient information to do so. If the original order is arbitrary, then the minimum size of the needed information is proportional to the number of elements in the array. Of the alternatives I presented, retaining a copy in the original order is easiest, and it might also require the least memory. – John Bollinger Mar 07 '23 at 13:29
  • Of the other two, I'd probably choose the auxiliary array of original element indices if I planned to or was willing to reverse the sort by making a [possibly temporary] copy in sorted order. I'd probably choose the list of swaps if I needed to reverse the sort in place. – John Bollinger Mar 07 '23 at 13:31
  • Note that not all sorts operate in terms of binary swaps, so that alternative might need to be adapted to your particular sort. Note also that the more efficient sorts tend to trade off many fewer element comparisons for somewhat more elements swaps / moves. – John Bollinger Mar 07 '23 at 13:36
1

You say you want to sort and unsort because the data needs to be compressed (which takes time) before being sent over a network, and compressing sorted data "should" be faster than compressing unsorted data.

This sounds like a case of premature optimization. "Should" be faster sounds like a guess. You should test compressing sorted data vs. compressing unsorted data to see if there's any difference. You would also need to take the sorting and unsorted time into account. There's a good chance that the cost of sorting would be much more any any possible gain from compressing a sorted list.

Even if you find that compressing sorted data is faster, it may not matter.

Here's the problem with unsorting.

Forget about software for a moment. Suppose I give you the following sorted list:

2 7 9 10 15 18 19 20 26 29

Now unsort it.

How would you do it? Without any additional information, there's no way to know what the "proper" order is. It may very well have been sorted to begin with, in which case it's already "unsorted".

At a minimum, you would need to know what position each value was in when the list is unsorted. So if the unsorted list looked like this:

10 9 26 29 18 2 20 7 19 15

You would need another list giving the original index of each value in the sorted list:

5 7 1 0 9 4 8 6 2 3

But now you have two lists of numbers, so you've doubled the size of the data you need to compress and send. On top of that, one of those lists is unsorted. So you're back to where you started.

The only possible way you could gain anything is if the items in the original list are not just numbers but a large aggregate data type, so that the list of indexes is small compared to the original list. In that case, you could add the original list index to the datatype to simplify keeping track of it. Then you would again have to test if compressing a sorted list is faster than an unsorted list. But again, the chance that compressing a sorted list is faster is small, even more so if the list contains complex data types.

My recommendation: just compress and send the unsorted list.

dbush
  • 205,898
  • 23
  • 218
  • 273
0

You can create a copy of your original array before you sort it or you can create another array of pointers and sort pointers... The memory is not a problem, the time yes, don't lose more time to find magic function that not exist...

gfdevelop
  • 187
  • 2
  • 7