3

I'm looking for an efficient way of sorting an array of email addresses to avoid items with the same domain to be consecutive, in C#.

Email addresses inside the array are already distinct and all of them are lower case.

Example:

Given an array with the following entries:

john.doe@domain1.com
jane_doe@domain1.com
patricksmith@domain2.com
erick.brown@domain3.com

I would like to obtain something similar to the following:

john.doe@domain1.com    
patricksmith@domain2.com
jane_doe@domain1.com
erick.brown@domain3.com
radarbob
  • 4,964
  • 2
  • 23
  • 36
Dbdj82
  • 58
  • 7
  • Your question isn't clear. I don't understand why *patrick* came before *jane*. – Erik Philips Nov 21 '15 at 22:02
  • 3
    @ErikPhilips i got what he means. OP dont want sort actually. OP wants to spread items by domain. so as you see `domain1` should not be behind another `domain1` – M.kazem Akhgary Nov 21 '15 at 22:04
  • 2
    I really like when seeing different people come up with different solutions on this. – TLJ Nov 21 '15 at 22:36
  • 2
    I think you should rephrase your question and use "rearrange" instead of "sort". – Mårten Wikström Nov 21 '15 at 23:17
  • 1
    Your problem is similar to the one discussed in my blog post, http://blog.mischel.com/2015/03/26/evenly-distributing-items-in-a-list/, with the additional restriction that you can't use the same domain twice in a row. Note that if more than n/2 emails are from the same domain, you can't prevent two in a row of the same domain. – Jim Mischel Nov 22 '15 at 05:37
  • Hi @Dbdj82, please be aware of the fact, that the professionals giving answers on SO are hungry for reputation points. It would be very kind of you to vote helpful answers up and check the best answer as the accepted answer. This will show to others that this question is solved. If you could not solve your issue, please leave a comment. You might want to check your earlier questions too... Thx – Shnugo Nov 24 '15 at 13:27

5 Answers5

2

With the help of an extension method (stolen from https://stackoverflow.com/a/27533369/172769), you can go like this:

List<string> emails = new List<string>();
emails.Add("john.doe@domain1.com");
emails.Add("jane_doe@domain1.com");
emails.Add("patricksmith@domain2.com");
emails.Add("erick.brown@domain3.com");

var q = emails.GroupBy(m => m.Split('@')[1]).Select(g => new List<string>(g)).Interleave();

The Interleave method is defined as:

public static IEnumerable<T> Interleave<T>(this IEnumerable<IEnumerable<T>> source )
{
    var queues = source.Select(x => new Queue<T>(x)).ToList();
    while (queues.Any(x => x.Any())) {
        foreach (var queue in queues.Where(x => x.Any())) {
            yield return queue.Dequeue();
        }
    }
}

So basically, we create groups based on the domain part of the email adresses, project (or Select) each group into a List<string>, and then "Interleave" those lists.

I have tested against your sample data, but more thorough testing might be needed to find edge cases.

DotNetFiddle snippet

Cheers

Community
  • 1
  • 1
Luc Morin
  • 5,302
  • 20
  • 39
1

Something like this will spread them equally, but you will have the problems (=consecutive elements) at the end of the new list...

        var list = new List<string>();
        list.Add("john.doe@domain1.com");
        list.Add("jane_doe@domain1.com");
        list.Add("patricksmith@domain2.com");
        list.Add("erick.brown@domain3.com");

        var x = list.GroupBy(content => content.Split('@')[1]);

        var newlist = new List<string>();
        bool addedSomething=true;
        int i = 0;
        while (addedSomething) {
            addedSomething = false;
            foreach (var grp in x) {
                if (grp.Count() > i) {
                    newlist.Add(grp.ElementAt(i));
                    addedSomething = true;
                }
            }
            i++;
        }
Shnugo
  • 66,100
  • 9
  • 53
  • 114
1

This will distribute them semi-evenly and attempt to avoid matching domains next to each other (although in certain lists that may be impossible). This answer will use OOP and Linq.

DotNetFiddle.Net Example

using System;
using System.Linq;
using System.Collections.Generic;
                    
public class Program
{
    public static void Main()
    {
        var seed = new List<string>()
        {
            "1@a.com",
            "2@a.com",
            "3@a.com",
            "4@a.com",
            "5@a.com",
            "6@a.com",
            "7@a.com",
            "8@a.com",
            "9@a.com",
            "10@a.com",
            
            "1@b.com",
            "2@b.com",
            "3@b.com",

            "1@c.com",
            "4@b.com",
            "2@c.com",
            "3@c.com",
            "4@c.com"
        };
        
        var work = seed
            // Create a list of EmailAddress objects
            .Select(s => new EmailAddress(s)) // s.ToLowerCase() ?
            // Group the list by Domain
            .GroupBy(s => s.Domain)
            // Create a List<EmailAddressGroup>
            .Select(g => new EmailAddressGroup(g))
            .ToList();
        
        var currentDomain = string.Empty;
        while(work.Count > 0)
        {
            // this list should not be the same domain we just used
            var noDups = work.Where(w => w.Domain != currentDomain);
            // if none exist we are done, or it can't be solved
            if (noDups.Count() == 0)
            {
                break;
            }
            // find the first group with the most items
            var workGroup = noDups.First(w => w.Count() == noDups.Max(g => g.Count()));
            // get the email address and remove it from the group list
            var workItem = workGroup.Remove();
            
            // if the group is empty remove it from *work*
            if (workGroup.Count() == 0)
            {
                work.Remove(workGroup);
                Console.WriteLine("removed: " + workGroup.Domain);
            }
            
            Console.WriteLine(workItem.FullEmail);
            
            // last domain looked at.
            currentDomain = workItem.Domain;
        }
        
        Console.WriteLine("Cannot disperse email addresses affectively, left overs:");
        
        foreach(var workGroup in work)
        {
            while(workGroup.Count() > 0)
            {
                var item = workGroup.Remove();
                Console.WriteLine(item.FullEmail);
            }
        }
            
            
    }
    
    public class EmailAddress
    {
        public EmailAddress(string emailAddress)
        {
            // Additional Email Address Validation
            
            var result = emailAddress.Split(new char[] {'@'}, StringSplitOptions.RemoveEmptyEntries)
                .ToList();
            
            if (result.Count() != 2)
            {
                new ArgumentException("emailAddress");
            }
            
            this.FullEmail = emailAddress;
            this.Name = result[0];
            this.Domain = result[1];
        }
        
        public string Name { get; private set; }
        public string Domain { get; private set; }
        public string FullEmail { get; private set; }
    }
    
    public class EmailAddressGroup
    {
        private List<EmailAddress> _emails;
        
        public EmailAddressGroup(IEnumerable<EmailAddress> emails)
        {
            this._emails = emails.ToList();
            this.Domain = emails.First().Domain;
        }
        
        public int Count()
        {
            return _emails.Count();
        }
        
        public string Domain { get; private set; }
        
        public EmailAddress Remove()
        {
            var result = _emails.First();
            _emails.Remove(result);
            return result;
        }
    }
}

Output:

1@a.com

1@b.com

2@a.com

1@c.com

3@a.com

2@b.com

4@a.com

2@c.com

5@a.com

3@b.com

6@a.com

3@c.com

7@a.com

removed: b.com

4@b.com

8@a.com

removed: c.com

4@c.com

9@a.com

Cannot disperse email addresses affectively, left overs:

10@a.com

Community
  • 1
  • 1
Erik Philips
  • 53,428
  • 11
  • 128
  • 150
1

Edit: Added a high level description :)

What this code does is group each element by the domain, sort the groups by size in descending order (largest group first), project the elements of each group into a stack, and pop them off of each stack (always pop the next element off the largest stack with a different domain). If there is only a single stack left, then its contents are yielded.

This should make sure that all domains distributed as evenly as possible.

MaxBy extension method from: https://stackoverflow.com/a/31560586/969962

private IEnumerable<string> GetNonConsecutiveEmails(List<string> list)
{
    var emailAddresses = list.Distinct().Select(email => new EmailAddress { Email = email, Domain = email.Split('@')[1]}).ToArray();
    var groups = emailAddresses
                .GroupBy(addr => addr.Domain)
                .Select (group => new { Domain = group.Key, EmailAddresses = new Stack<EmailAddress>(group)})
                .ToList();

    EmailAddress lastEmail = null;
    while(groups.Any(g => g.EmailAddresses.Any()))
    {
        // Try and pick from the largest stack.
        var stack = groups
            .Where(g => (g.EmailAddresses.Any()) && (lastEmail == null ? true : lastEmail.Domain != g.Domain))
            .MaxBy(g => g.EmailAddresses.Count);

        // Null check to account for only 1 stack being left.
        // If so, pop the elements off the remaining stack.   
        lastEmail = (stack ?? groups.First(g => g.EmailAddresses.Any())).EmailAddresses.Pop();
        yield return lastEmail.Email;
    }
}

class EmailAddress
{
    public string Domain;
    public string Email;
}

public static class Extensions
{
    public static T MaxBy<T,U>(this IEnumerable<T> data, Func<T,U> f) where U:IComparable
    {
        return data.Aggregate((i1, i2) => f(i1).CompareTo(f(i2))>0 ? i1 : i2);
    }
}
Community
  • 1
  • 1
Sef
  • 346
  • 3
  • 10
  • This would work well only when number of emails evenly distribute between domains I think. – TLJ Nov 21 '15 at 23:56
  • Yikes, you're right. Last edit should address that. – Sef Nov 22 '15 at 00:47
  • Would be nice if you provided a high level description of your code. – Luc Morin Nov 22 '15 at 01:05
  • @LucMorin agreed. I added a description to clarify what is going on. – Sef Nov 22 '15 at 04:03
  • If you used a priority queue to keep the domain lists in order, this would be much more efficient. See http://blog.mischel.com/2015/03/26/evenly-distributing-items-in-a-list/ for a similar problem. – Jim Mischel Nov 22 '15 at 05:38
0

What I am trying to do here is to sort them first. Then I re-arrange from a different end. I'm sure there're more efficient ways to do this but this is one easy way to do it.

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;

namespace ConsoleApplication4
{
    class Program
    {
        static void Main(string[] args)
        {
            String[] emails = { "john.doe@domain1.com", "jane_doe@domain1.com", "patricksmith@domain2.com", "erick.brown@domain3.com" };
            var result = process(emails);
        }

        static String[] process(String[] emails)
        {
            String[] result = new String[emails.Length];
            var comparer = new DomainComparer();
            Array.Sort(emails, comparer);
            for (int i = 0, j = emails.Length - 1, k = 0; i < j; i++, j--, k += 2)
            {
                if (i == j)
                    result[k] = emails[i];
                else
                {
                    result[k] = emails[i];
                    result[k + 1] = emails[j];
                }
            }

            return result;
        }
    }

    public class DomainComparer : IComparer<string>
    {
        public int Compare(string left, string right)
        {
            int at_pos = left.IndexOf('@');
            var left_domain = left.Substring(at_pos, left.Length - at_pos);
            at_pos = right.IndexOf('@');
            var right_domain = right.Substring(at_pos, right.Length - at_pos);
            return String.Compare(left_domain, right_domain);
        }
    }
}
TLJ
  • 4,525
  • 2
  • 31
  • 46