13

I've been trying to solve this interview problem which asks to shuffle a string so that no two adjacent letters are identical For example,

ABCC -> ACBC

The approach I'm thinking of is to

1) Iterate over the input string and store the (letter, frequency) pairs in some collection

2) Now build a result string by pulling the highest frequency (that is > 0) letter that we didn't just pull

3) Update (decrement) the frequency whenever we pull a letter

4) return the result string if all letters have zero frequency

5) return error if we're left with only one letter with frequency greater than 1

With this approach we can save the more precious (less frequent) letters for last. But for this to work, we need a collection that lets us efficiently query a key and at the same time efficiently sort it by values. Something like this would work except we need to keep the collection sorted after every letter retrieval.

I'm assuming Unicode characters.

Any ideas on what collection to use? Or an alternative approach?

Community
  • 1
  • 1
NGambit
  • 1,141
  • 13
  • 27
  • Looks like a duplicate of http://stackoverflow.com/questions/37040164/how-to-shuffle-a-character-array-with-no-two-duplicates-next-to-each-other. – Vijay Aug 26 '16 at 17:36
  • Maybe but I'm asking if there is C# collection/data structure one could use for this purpose – NGambit Aug 26 '16 at 17:44
  • 1
    Bogosort until criteria is met! – itsme86 Aug 26 '16 at 18:08
  • This is related to the problem of evenly distributing items. See http://stackoverflow.com/q/37452547/56778. Could potentially be considered a duplicate? – Jim Mischel Aug 26 '16 at 18:30

7 Answers7

14

You can sort the letters by frequency, split the sorted list in half, and construct the output by taking letters from the two halves in turn. This takes a single sort.

Example:

  • Initial string: ACABBACAB
  • Sort: AAAABBBCC
  • Split: AAAA+BBBCC
  • Combine: ABABABCAC

If the number of letters of highest frequency exceeds half the length of the string, the problem has no solution.

Manish Kumar
  • 397
  • 3
  • 11
Sergey Kalinichenko
  • 714,442
  • 84
  • 1,110
  • 1,523
  • You don't need to sort by frequency; just sort (and then interleave starting with the second half). I'm not sure how you sort the letters by frequency with a single sort; you'd need to first sort and then sort the frequencies, no? – rici Aug 26 '16 at 20:56
  • @rici You are right, just sorting may actually work with some care (say, `ABBBC` would require splitting `ABB`+`BC` and starting the merge with `BC`). Sorting by frequency requires counting frequencies, collecting and sorting them together with the letter, and finally reconstructing the array from the counters. – Sergey Kalinichenko Aug 26 '16 at 22:01
  • Indeed, the case where you have an odd length vector and one letter's frequency is just over half that length is complicated, but it's easy to detect that case and special case it if necessary. – rici Aug 26 '16 at 22:33
  • @AndrewHanlon I'd be curious to know how an algorithm that can be implemented as O(n) is not the most efficient, especially in comparison to an O(n^2) one. – Sergey Kalinichenko Aug 31 '16 at 16:25
  • @dasblinkenlight Not all O(n) algorithms are made equal. Combining requires an additional loop. Sorting by magnitude is a lot of additional work that is not needed. Therefore modifying the Sort algorithm in the first place can certainly create a more efficient algorithm. – Andrew Hanlon Aug 31 '16 at 16:32
  • @rici: I'm not sure that just sorting works. For example, if we have `abbbbcd`, and we split into `abbb` and `bcd` then we'll get `abbcbd`. (Or if we split into `abb` and `bbcd` then we'll get `abbbbcd`.) But we need to construct `babcbdb`... – John Kurlak Jul 02 '19 at 14:58
  • @johnkurlak: this problem was discussed in the comment thread above :-) However, I didn't provide the precise fix. Here it is. First sort the string. If the string has odd length, put the middle letter first. Either one of the two remaining halves will start with a different letter, or there is no solution. Based on that, either report failure or continue by interleaving the remaining halves, starting with the one whose first letter differs from the letter just placed. – rici Jul 02 '19 at 15:26
  • @rici: The issue I'm describing doesn't seem to be solved by the algorithm you've proposed (as I understand it, please correct me if I'm wrong). If I have the string `abbbbcd` (sorted), we can see that it's an odd-length string. So your proposal is to append the middle character (`b`) to the output string first? Then that leaves me with `abb` and `bcd` to be merged. So now I start interleaving starting with the half that doesn't start with `b`, which is `abb`. So I append `a` from `abb` then `b` from `bcd` and so on. That leaves me with the final result of: `babbcbd` (wrong answer). – John Kurlak Jul 02 '19 at 20:35
  • @rici: Also, the problem I'm describing wasn't discussed above. The problem I'm describing is present even if we don't have an odd-length string. The discussion above pertains to odd-length strings. For example, the problem is exhibited by this even-length string: `abbbcd`. If we interweave `abb` and `bcd` we get: `abbcbd`. But the correct answer is `babcbd`. – John Kurlak Jul 02 '19 at 20:40
  • 1
    @john: for even-length strings,my original comment says "start with the second half", and that certainly will avoid repeats unless there is a strict majority character. For odd-length strings, you're right: the algo I just proposed doesn't cut it, as stated. But there is an interleave which works; it's just a bit complex for a comment. But I'll give it a whirl (although I bet you can see it, too). – rici Jul 03 '19 at 01:49
  • 1
    @john: We have a sorted string of length `2n+1`. No character appears more then `n+1` times. (Otherwise there is no solution.) We start by placing character index `n` (0-based). There are now two substrings: `0..n-1` and `n+1..2n`. I assert: for any `i` in the range `0..n-1`, the characters at offset `i` in the two substrings are distinct. Those two characters were at indices `i` and `n+1+i` in the original sorted string; if they are the same, there are `n+2` identical characters in the original string, contradicting the precondition... – rici Jul 03 '19 at 01:57
  • 1
    ... Since they are different, at least one is not the same as the last character placed. Place that character next, then the other character from this pair (which is different), and then advance `i` to the next pair, until done. – rici Jul 03 '19 at 01:59
  • @rici: For even length strings, what do you do if there is a strict majority character? – John Kurlak Jul 03 '19 at 05:09
  • @johnkurlak: fail. No solution is possible. – rici Jul 03 '19 at 05:11
  • @rici: Thanks for the great explanations for the even-length and odd-length scenarios. This makes sense! Great algorithm! – John Kurlak Jul 03 '19 at 05:20
  • @rici: Depending on the alphabet size, it might be better to go with dasblinkenlight's algorithm over your algorithm. We know frequency is bounded by the range 1...n, so we can apply a linear sort like counting sort. We don't necessarily have a bound for alphabet size, so a generic sort may be needed, which could be slower if alphabet size is big enough. – John Kurlak Jul 03 '19 at 05:25
  • 1
    @JohnKurlak: Now that I went to all that work :-), I figured I might as well put it in an answer. – rici Jul 03 '19 at 21:45
1

Why not use two Data Structures: One for sorting (Like a Heap) and one for key retrieval, like a Dictionary?

1

The accepted answer may produce a correct result, but is likely not the 'correct' answer to this interview brain teaser, nor the most efficient algorithm.

The simple answer is to take the premise of a basic sorting algorithm and alter the looping predicate to check for adjacency rather than magnitude. This ensures that the 'sorting' operation is the only step required, and (like all good sorting algorithms) does the least amount of work possible.

Below is a c# example akin to insertion sort for simplicity (though many sorting algorithm could be similarly adjusted):

string NonAdjacencySort(string stringInput)
{
    var input = stringInput.ToCharArray();

    for(var i = 0; i < input.Length; i++)
    {
        var j = i;

        while(j > 0 && j < input.Length - 1 && 
              (input[j+1] == input[j] || input[j-1] == input[j]))
        {
            var tmp = input[j];
            input[j] = input[j-1];
            input[j-1] = tmp;           
            j--;
        }

        if(input[1] == input[0])
        {
            var tmp = input[0];
            input[0] = input[input.Length-1];
            input[input.Length-1] = tmp;
        }
    }

    return new string(input);
}

The major change to standard insertion sort is that the function has to both look ahead and behind, and therefore needs to wrap around to the last index.

A final point is that this type of algorithm fails gracefully, providing a result with the fewest consecutive characters (grouped at the front).

Andrew Hanlon
  • 7,271
  • 4
  • 33
  • 53
1

Since I somehow got convinced to expand an off-hand comment into a full algorithm, I'll write it out as an answer, which must be more readable than a series of uneditable comments.

The algorithm is pretty simple, actually. It's based on the observation that if we sort the string and then divide it into two equal-length halves, plus the middle character if the string has odd length, then corresponding positions in the two halves must differ from each other, unless there is no solution. That's easy to see: if the two characters are the same, then so are all the characters between them, which totals ⌈n/2⌉&plus;1 characters. But a solution is only possible if there are no more than ⌈n/2⌉ instances of any single character.

So we can proceed as follows:

  1. Sort the string.
  2. If the string's length is odd, output the middle character.
  3. Divide the string (minus its middle character if the length is odd) into two equal-length halves, and interleave the two halves.
  4. At each point in the interleaving, since the pair of characters differ from each other (see above), at least one of them must differ from the last character output. So we first output that character and then the corresponding one from the other half.

The sample code below is in C++, since I don't have a C# environment handy to test with. It's also simplified in two ways, both of which would be easy enough to fix at the cost of obscuring the algorithm:

  • If at some point in the interleaving, the algorithm encounters a pair of identical characters, it should stop and report failure. But in the sample implementation below, which has an overly simple interface, there's no way to report failure. If there is no solution, the function below returns an incorrect solution.

  • The OP suggests that the algorithm should work with Unicode characters, but the complexity of correctly handling multibyte encodings didn't seem to add anything useful to explain the algorithm. So I just used single-byte characters. (In C# and certain implementations of C++, there is no character type wide enough to hold a Unicode code point, so astral plane characters must be represented with a surrogate pair.)

#include <algorithm>
#include <iostream>
#include <string>

// If possible, rearranges 'in' so that there are no two consecutive
// instances of the same character. 
std::string rearrange(std::string in) {
  // Sort the input. The function is call-by-value,
  // so the argument itself isn't changed.
  std::string out;
  size_t len = in.size();
  if (in.size()) {
    out.reserve(len);
    std::sort(in.begin(), in.end());
    size_t mid = len / 2;
    size_t tail = len - mid;
    char prev = in[mid]; 
    // For odd-length strings, start with the middle character.
    if (len & 1) out.push_back(prev);
    for (size_t head = 0; head < mid; ++head, ++tail) 
      // See explanatory text
      if (in[tail] != prev) {
        out.push_back(in[tail]);
        out.push_back(prev = in[head]);
      }
      else {
        out.push_back(in[head]);
        out.push_back(prev = in[tail]);
      }
    }
  }
  return out;
}
rici
  • 234,347
  • 28
  • 237
  • 341
1

you can do that by using a priority queue. Please find the below explanation. https://iq.opengenus.org/rearrange-string-no-same-adjacent-characters/

  • 3
    Please consider using the content of the link instead of just the link. Links can be invalidated and as long as you're giving the credit to the original author there is no problem of doing so. – Gnqz Mar 10 '20 at 13:43
0

Here is a probabilistic approach. The algorithm is:

10) Select a random char from the input string.
20) Try to insert the selected char in a random position in the output string.
30) If it can't be inserted because of proximity with the same char, go to 10.
40) Remove the selected char from the input string and go to 10.
50) Continue until there are no more chars in the input string, or the failed attempts are too many.

public static string ShuffleNoSameAdjacent(string input, Random random = null)
{
    if (input == null) return null;
    if (random == null) random = new Random();
    string output = "";
    int maxAttempts = input.Length * input.Length * 2;
    int attempts = 0;
    while (input.Length > 0)
    {
        while (attempts < maxAttempts)
        {
            int inputPos = random.Next(0, input.Length);
            var outputPos = random.Next(0, output.Length + 1);
            var c = input[inputPos];
            if (outputPos > 0 && output[outputPos - 1] == c)
            {
                attempts++; continue;
            }
            if (outputPos < output.Length && output[outputPos] == c)
            {
                attempts++; continue;
            }
            input = input.Remove(inputPos, 1);
            output = output.Insert(outputPos, c.ToString());
            break;
        }
        if (attempts >= maxAttempts) throw new InvalidOperationException(
            $"Shuffle failed to complete after {attempts} attempts.");
    }
    return output;
}

Not suitable for strings longer than 1,000 chars!


Update: And here is a more complicated deterministic approach. The algorithm is:

  1. Group the elements and sort the groups by length.
  2. Create three empty piles of elements.
  3. Insert each group to a separate pile, inserting always the largest group to the smallest pile, so that the piles differ in length as little as possible.
  4. Check that there is no pile with more than half the total elements, in which case satisfying the condition of not having same adjacent elements is impossible.
  5. Shuffle the piles.
  6. Start yielding elements from the piles, selecting a different pile each time.
  7. When the piles that are eligible for selection are more than one, select randomly, weighting by the size of each pile. Piles containing near half of the remaining elements should be much preferred. For example if the remaining elements are 100 and the two eligible piles have 49 and 40 elements respectively, then the first pile should be 10 times more preferable than the second (because 50 - 49 = 1 and 50 - 40 = 10).
    public static IEnumerable<T> ShuffleNoSameAdjacent<T>(IEnumerable<T> source,
        Random random = null, IEqualityComparer<T> comparer = null)
    {
        if (source == null) yield break;
        if (random == null) random = new Random();
        if (comparer == null) comparer = EqualityComparer<T>.Default;
        var grouped = source
            .GroupBy(i => i, comparer)
            .OrderByDescending(g => g.Count());
        var piles = Enumerable.Range(0, 3).Select(i => new Pile<T>()).ToArray();
        foreach (var group in grouped)
        {
            GetSmallestPile().AddRange(group);
        }
        int totalCount = piles.Select(e => e.Count).Sum();
        if (piles.Any(pile => pile.Count > (totalCount + 1) / 2))
        {
            throw new InvalidOperationException("Shuffle is impossible.");
        }

        piles.ForEach(pile => Shuffle(pile));
        Pile<T> previouslySelectedPile = null;
        while (totalCount > 0)
        {
            var selectedPile = GetRandomPile_WeightedByLength();
            yield return selectedPile[selectedPile.Count - 1];
            selectedPile.RemoveAt(selectedPile.Count - 1);
            totalCount--;
            previouslySelectedPile = selectedPile;
        }

        List<T> GetSmallestPile()
        {
            List<T> smallestPile = null;
            int smallestCount = Int32.MaxValue;
            foreach (var pile in piles)
            {
                if (pile.Count < smallestCount)
                {
                    smallestPile = pile;
                    smallestCount = pile.Count;
                }
            }
            return smallestPile;
        }

        void Shuffle(List<T> pile)
        {
            for (int i = 0; i < pile.Count; i++)
            {
                int j = random.Next(i, pile.Count);
                if (i == j) continue;
                var temp = pile[i];
                pile[i] = pile[j];
                pile[j] = temp;
            }
        }

        Pile<T> GetRandomPile_WeightedByLength()
        {
            var eligiblePiles = piles
                .Where(pile => pile.Count > 0 && pile != previouslySelectedPile)
                .ToArray();
            Debug.Assert(eligiblePiles.Length > 0, "No eligible pile.");
            eligiblePiles.ForEach(pile =>
            {
                pile.Proximity = ((totalCount + 1) / 2) - pile.Count;
                pile.Score = 1;
            });
            Debug.Assert(eligiblePiles.All(pile => pile.Proximity >= 0),
                "A pile has negative proximity.");
            foreach (var pile in eligiblePiles)
            {
                foreach (var otherPile in eligiblePiles)
                {
                    if (otherPile == pile) continue;
                    pile.Score *= otherPile.Proximity;
                }
            }
            var sumScore = eligiblePiles.Select(p => p.Score).Sum();
            while (sumScore > Int32.MaxValue)
            {
                eligiblePiles.ForEach(pile => pile.Score /= 100);
                sumScore = eligiblePiles.Select(p => p.Score).Sum();
            }
            if (sumScore == 0)
            {
                return eligiblePiles[random.Next(0, eligiblePiles.Length)];
            }
            var randomScore = random.Next(0, (int)sumScore);
            int accumulatedScore = 0;
            foreach (var pile in eligiblePiles)
            {
                accumulatedScore += (int)pile.Score;
                if (randomScore < accumulatedScore) return pile;
            }
            Debug.Fail("Could not select a pile randomly by weight.");
            return null;
        }
    }

    private class Pile<T> : List<T>
    {
        public int Proximity { get; set; }
        public long Score { get; set; }
    }

This implementation can suffle millions of elements. I am not completely convinced that the quality of the suffling is as perfect as the previous probabilistic implementation, but should be close.

Theodor Zoulias
  • 34,835
  • 7
  • 69
  • 104
0
func shuffle(str:String)-> String{

    var shuffleArray = [Character](str)
    
    //Sorting
    shuffleArray.sort() 
    
    var shuffle1 = [Character]()
    var shuffle2 = [Character]()
    var adjacentStr = ""

    //Split
    for i in 0..<shuffleArray.count{ 
        
        if i > shuffleArray.count/2  {
            shuffle2.append(shuffleArray[i])
        }else{
            shuffle1.append(shuffleArray[i])
        }
    }
    
    let count = shuffle1.count > shuffle2.count ? shuffle1.count:shuffle2.count

    //Merge with adjacent element

    for i in 0..<count {
        
        if i < shuffle1.count{
            adjacentStr.append(shuffle1[i])
        }
        
        if i < shuffle2.count{
            adjacentStr.append(shuffle2[i])
        }
    }
    
    return adjacentStr
}

let s = shuffle(str: "AABC")

print(s)
Markus Meyer
  • 3,327
  • 10
  • 22
  • 35
shashi
  • 1