How to efficiently rearrange characters in a string so that there are no pairs?

Question

There is a string of up to 10^5 A..Z characters. The task is to rearrange it so that none of the characters forms a row. For multiple solutions the one that comes first alphabetically is correct. And it has to be done in reasonable time, of course.

So I tried two approaches and one of them rearranges correctly but very inefficiently, and the other one has broken logic, I guess.

Slow approach.

Sort the array of chars
Rearrange in place, similar to the insertion sort (that's why it is so slow)

Incorrect approach (doesn't always give the alphabetically first answer)

Sort
Move the elements starting from the end to another array, maintaining the invariant
Reverse

I am really stuck and can't think of anything else here.

Examples:

"HHABC" -> "ABHCH";
"CBAXXXX" -> "XAXBXCX";
"AAABBBCCCCCCCDDDDD" -> "ABABACBCDCDCDCDCDC";

Here is the model solution algorithm in details. I think I am not allowed to post the actual code, so it's like this:

Build the histogram. It can be stored in an array of int, indexes being the ascii-codes, so no mapping is needed.
Build the new string:
- for each character go alphabetically, from A to Z
- if there is no such symbol according to the histogram, move to the next
- if the previous written symbol was the same, move to the next (if it is not the first iteration)
- write the first found suitable character
- decrease the number in the histogram
- if there's (length-i+2)/2 of the current symbol (half of what's left), break from this cycle and go to the next symbol in the string.

It is much shorter and simpler than the mess I wrote eventually (although it worked fine, and thank you very much for the help).

You really don't need to do any sorting or rearranging of the input data. Since you only have 26 possible input values just generate a histogram in O(n) and then use the histogram bin counts to generate a suitably interleaved output sequence, again in O(n). — Paul R, Sep 08 '14 at 14:47
I knew that my thinking is overcomplicated. But now I can't quite grab this idea. How do I do the second part, if in simple words? — Simoroshka, Sep 08 '14 at 15:08

score 1 · Accepted Answer · answered Sep 08 '14 at 15:13

1

Good point Paul R. If you have your histogram with the number of occurrences of each element, you could then sort those buckets by number of occurrences from high to low. As long as the number of occurrences in one bucket is not greater than the number of buckets, you should be able to form a viable string. Starting with the largest buckets, loop through each occurrence in the next largest bucket down to the smaller buckets.

For example, AAAAABBCRRSTT

AAAAA BB C RR S TT -> AAAAA BB RR TT C S

ABARATACASBRT

answered Sep 08 '14 at 15:13

misthunter22

26
2

Thank you! I just started to get the idea a moment before I read your answer. I will go and try it now. Sounds solid and simple. – Simoroshka Sep 08 '14 at 15:17
Your welcome! You might also need to look at taking those buckets and moving a few characters around to making them more equal size. That way you can avoid collisions with those input characters that are quite frequent. – misthunter22 Sep 08 '14 at 15:20
But actually the answer is incorrect. It should be ABABACARARTST – Simoroshka Sep 08 '14 at 15:22
Ah, now i see what you mean by the one that comes first alphabetically is correct. You can definitely sort the buckets: AAAAA BB C RR S TT, but there is still some potential for collisions. You would have to say, if the length of bucket[i] < bucket[i+1], swap the bucket positions. – misthunter22 Sep 08 '14 at 15:30

How to efficiently rearrange characters in a string so that there are no pairs?

1 Answers1