10

It's a bonus school task for which we didn't receive any teaching yet and I'm not looking for a complete code, but some tips to get going would be pretty cool. Going to post what I've done so far in Java when I get home, but here's something I've done already.

So, we have to do a sorting algorithm, which for example sorts "AAABBB" to the ABABAB. Max input size is 10^6, and it all has to happen under 1 second. If there's more than one answer, the first one in alphabetical order is the right one. I started to test different algorithms to even sort them without that alphabetical order requirement in mind, just to see how the things work out.

First version:

Save the ascii codes to the Integer array where index is the ascii code, and the value is amount which that character occurs in the char array. Then I picked 2 highest numbers, and started to spam them to the new character array after each other, until some number was higher, and I swapped to it. It worked well, but of course the order wasn't right.

Second version:

Followed the same idea, but stopped picking the most occurring number and just picked the indexes in the order they were in my array. Works well until the input is something like CBAYYY. Algorithm sorts it to the ABCYYY instead of AYBYCY. Of course I could try to find some free spots for those Y's, but at that point it starts to take too long.

Peter O.
  • 32,158
  • 14
  • 82
  • 96
JjyKs
  • 101
  • 1
  • 3
  • 1
    I would say you definitely must start by counting each character first. (If you've got more than length/2 from one character, you can stop there too because no valid solution exists.) – biziclop Sep 04 '14 at 08:37
  • 1
    Please formally define the task better, your example is not sufficient. Also - what is the size of your alphabet? – amit Sep 04 '14 at 08:39
  • I think you should start by sort your array then modify it – Eliott Roynette Sep 04 '14 at 08:44
  • 5
    "Sorting" a character array, so there isn't the same characters next to each other - this is **not** sorting but **rearranging**. Misleading title. Sorting would put same characters next to each other. – icza Sep 04 '14 at 08:46
  • 1
    @icza Since tie breaker is alphabetical order, there is also sorting involved. – amit Sep 04 '14 at 08:47
  • 3
    possible duplicate of [Generate all permutations of a list without adjacent equal elements](http://stackoverflow.com/questions/25285792/generate-all-permutations-of-a-list-without-adjacent-equal-elements) – Florent Bayle Sep 04 '14 at 08:50
  • @FlorentBayle This is not a dupe. This question asks for the first one (in alphabetical order) and fast - not all of the permutations, which is much slower to find. – amit Sep 04 '14 at 08:53
  • 1
    Can you assume that it's even possible? What happens with 'AAAAA' ? – soulcheck Sep 04 '14 at 09:01
  • 1
    @amit I had to think of the same problem anyway. Of course, there was no constraint on alphabetical order there but I am pretty sure that solutions to that post can be used here. – Vincent van der Weele Sep 04 '14 at 09:15
  • 1
    Firstly, if one letter occurs with frequency more than `ceil(n/2)`, then it's impossible, by the pigeonhole principle. Now, since you want the lexicographically smallest solution, you first thought should be the greedy algorithm that assigns the minimal letter at each step (eg. aabc -> abac). Is this correct? Yes, *while it works*. It may happen at the end you have a surplus of the last letter (eg. aabcc -> abac—one c left over). At this point, you have a *lexicographically smallest* solution to the subproblem. Think how you can fit the last letter(s) keeping minimality. – Colonel Panic Sep 04 '14 at 09:54

6 Answers6

7

An interesting problem, with an interesting tweak. Yes, this is a permutation or rearranging rather than a sort. No, the quoted question is not a duplicate.

Algorithm.

  1. Count the character frequencies.
  2. Output alternating characters from the two lowest in alphabetical order.
  3. As each is exhausted, move to the next.
  4. At some point the highest frequency char will be exactly half the remaining chars. At that point switch to outputting all of that char alternating in turn with the other remaining chars in alphabetical order.

Some care required to avoid off-by-one errors (odd vs even number of input characters). Otherwise, just writing the code and getting it to work right is the challenge.


Note that there is one special case, where the number of characters is odd and the frequency of one character starts at (half plus 1). In this case you need to start with step 4 in the algorithm, outputting all one character alternating with each of the others in turn.

Note also that if one character comprises more than half the input then apart for this special case, no solution is possible. This situation may be detected in advance by inspecting the frequencies, or during execution when the tail consists of all one character. Detecting this case was not part of the spec.


Since no sort is required the complexity is O(n). Each character is examined twice: once when it is counted and once when it is added to the output. Everything else is amortised.

david.pfx
  • 10,520
  • 3
  • 30
  • 63
  • what about the case cccba? the correct answer is cacbc, but I believe in yours is abccc – Pham Trung Sep 04 '14 at 09:10
  • @PhamTrung No, step 2 is skipped, because step 4 applies directly (it's not exactly half btw, in case of odd length it can be half rounded up) – Vincent van der Weele Sep 04 '14 at 09:34
  • 1
    It might improve the Answer (+1 btw) to point out the condition when no permutation satisfies the no adjacent same character requirement, i.e. the highest frequency is more than the ceiling of n/2. – hardmath Sep 04 '14 at 15:22
  • @hardmath: Thanks, you're right, I should have covered that. See edit. – david.pfx Sep 05 '14 at 06:00
0

You start by count each number of letter you have in your array:

For example you have 3 - A, 2 - B, 1 - C, 4 - Y, 1 - Z.

1) Then you put each time the lowest one (it is A), you can put.

so you start by :

A

then you can not put A any more so you put B:

AB

then:

ABABACYZ

These works if you have still at least 2 kind of characters. But here you will have still 3 Y.

2) To put the last characters, you just go from your first Y and insert one on 2 in direction of beginning.(I don't know if these is the good way to say that in english).

So ABAYBYAYCYZ.

3) Then you take the subsequence between your Y so YBYAYCY and you sort the letter between the Y :

BAC => ABC

And you arrive at

ABAYAYBYCYZ

which should be the solution of your problem.

To do all this stuff, I think a LinkedList is the best way

I hope it help :)

dzimiks
  • 329
  • 1
  • 3
  • 14
Eliott Roynette
  • 716
  • 8
  • 21
0

My idea is the following. With the right implementation it can be almost linear.

First establish a function to check if the solution is even possible. It should be very fast. Something like most frequent letter > 1/2 all letters and take into cosideration if it can be first.

Then while there are still letters remaining take the alphabetically first letter that is not the same as previous, and makes further solution possible.

Kuba
  • 2,986
  • 2
  • 26
  • 32
  • you start with A. but find out that the remaining part BB cannot be legally written. So instead you start with B. then you write A, then B. Resulting in BAB. – Kuba Sep 04 '14 at 10:37
  • This method fails for AAABBBXXXXX. – david.pfx Sep 05 '14 at 06:02
  • no it doesn't. you start with A, and check that the 2A3B5X can be done. then you cannot start with A so you try B, you find out that 2A2B5X can still be done. then you try A, you realize that a2B5X cannot be done, so you try B, the same conclusion, so you try X. and so on... ABXAXAXBXBX – Kuba Sep 05 '14 at 07:22
0

The correct algorithm would be the following:

  1. Build a histogram of the characters in the input string.
  2. Put the CharacterOccurrences in a PriorityQueue / TreeSet where they're ordered on highest occurrence, lowest alphabetical order
  3. Have an auxiliary variable of type CharacterOccurrence
  4. Loop while the PQ is not empty

    1. Take the head of the PQ and keep it
    2. Add the character of the head to the output
    3. If the auxiliary variable is set => Re-add it to the PQ
    4. Store the kept head in the auxiliary variable with 1 occurrence less unless the occurrence ends up being 0 (then unset it)
  5. if the size of the output == size of the input, it was possible and you have your answer. Else it was impossible.

Complexity is O(N * log(N))

Hiery Nomus
  • 17,429
  • 2
  • 41
  • 37
  • This is incorrect. Alpha sort should take priority over frequency, it does not alternate and it does not handle AABBBZZZZ. – david.pfx Sep 04 '14 at 13:44
0

Make a bi directional table of character frequencies: character->count and count->character. Record an optional<Character> which stores the last character (or none of there is none). Store the total number of characters.

If (total number of characters-1)<2*(highest count character count), use the highest count character count character. (otherwise there would be no solution). Fail if this it the last character output.

Otherwise, use the earliest alphabetically that isn't the last character output.

Record the last character output, decrease both the total and used character count.

Loop while we still have characters.

Yakk - Adam Nevraumont
  • 262,606
  • 27
  • 330
  • 524
0

While this question is not quite a duplicate, the part of my answer giving the algorithm for enumerating all permutations with as few adjacent equal letters as possible readily can be adapted to return only the minimum, as its proof of optimality requires that every recursive call yield at least one permutation. The extent of the changes outside of the test code are to try keys in sorted order and to break after the first hit is found. The running time of the code below is polynomial (O(n) if I bothered with better data structures), since unlike its ancestor it does not enumerate all possibilities.

david.pfx's answer hints at the logic: greedily take the least letter that doesn't eliminate all possibilities, but, as he notes, the details are subtle.

from collections import Counter
from itertools import permutations
from operator import itemgetter
from random import randrange


def get_mode(count):
    return max(count.items(), key=itemgetter(1))[0]


def enum2(prefix, x, count, total, mode):
    prefix.append(x)
    count_x = count[x]
    if count_x == 1:
        del count[x]
    else:
        count[x] = count_x - 1
    yield from enum1(prefix, count, total - 1, mode)
    count[x] = count_x
    del prefix[-1]


def enum1(prefix, count, total, mode):
    if total == 0:
        yield tuple(prefix)
        return
    if count[mode] * 2 - 1 >= total and [mode] != prefix[-1:]:
        yield from enum2(prefix, mode, count, total, mode)
    else:
        defect_okay = not prefix or count[prefix[-1]] * 2 > total
        mode = get_mode(count)
        for x in sorted(count.keys()):
            if defect_okay or [x] != prefix[-1:]:
                yield from enum2(prefix, x, count, total, mode)
                break


def enum(seq):
    count = Counter(seq)
    if count:
        yield from enum1([], count, sum(count.values()), get_mode(count))
    else:
        yield ()


def defects(lst):
    return sum(lst[i - 1] == lst[i] for i in range(1, len(lst)))


def test(lst):
    perms = set(permutations(lst))
    opt = min(map(defects, perms))
    slow = min(perm for perm in perms if defects(perm) == opt)
    fast = list(enum(lst))
    assert len(fast) == 1
    fast = min(fast)
    print(lst, fast, slow)
    assert slow == fast


for r in range(10000):
    test([randrange(3) for i in range(randrange(6))])
Community
  • 1
  • 1
David Eisenstat
  • 64,237
  • 7
  • 60
  • 120