0

I have a list of objects which can have one or more relationships to one another. I'd like to run through this list and compare each object to all other objects in the list, setting the relationships as I compare the objects. Because this comparison in real life is fairly complex and time consuming I'm trying to do this asynchronously.

I've quickly put together some sample code which illustrates the issue at hand in a fairly simple fashion.

class Program
    {
        private static readonly Word[] _words =
        {
            new Word("Beef"),
            new Word("Bull"),
            new Word("Space")
        };

        static void Main()
        {
            var tasks = new List<Task>();

            foreach (var word in _words)
            {
                tasks.Add(CheckRelationShipsAsnc(word));
            }

            Task.WhenAll(tasks);
        }

        static async Task CheckRelationShipsAsnc(Word leftWord)
        {
            await Task.Run(() =>
            {
                foreach (var rightWord in _words)
                {
                    if(leftWord.Text.First() == rightWord.Text.First())
                    {
                        leftWord.RelationShips.Add(rightWord);
                    }
                }
            });
        }
    }

    class Word
    {
        public string Text { get; }
        public List<Word> RelationShips { get; } = new List<Word>();

        public Word(string text)
        {
            if(string.IsNullOrEmpty(text)) throw new ArgumentException();
            Text = text;
        }

        public override string ToString()
        {
            return $"{Text} ({RelationShips.Count} relationships)";
        }
    }

The expected result would be that "Space" has no relationships whereas the words "Bull" and "Beef" have one relationship to one another. What I get is all words have no relationsships at all. I'm having trouble understanding what exactly the problem is.

Mats
  • 14,902
  • 33
  • 78
  • 110
  • A small pedantic correction: the term *"asynchronously"* is not appropriate in this case. The correct term is *"in parallel"* (although *"concurrently"* would be a good fit too). If you are interested to know exactly what these terms mean, you could take a look [here](https://stackoverflow.com/questions/4844637/what-is-the-difference-between-concurrency-parallelism-and-asynchronous-methods). – Theodor Zoulias Jun 01 '20 at 12:13
  • 2
    Also when dealing with parallel code, you must be constantly on alert for possible thread-safety issues. For example what is the type of the property `Word.RelationShips`? If it's a `List`, then modifying it from multiple threads without synchronization may easily result to corruption of the internal state of the list. – Theodor Zoulias Jun 01 '20 at 12:23

2 Answers2

3

You should make Main method async as well and await Task.WhenAll. Otherwise the result task for WhenAll will not run its execution. You can also simplify tasks creation using Linq

static async Task Main()
{
    var tasks = _words.Select(CheckRelationShipsAsync);
    await Task.WhenAll(tasks);
}

You can also use Wait() or WaitAll method, which runs synchronously and blocks the current thread (so, it isn't a recommend approach). But it doesn't require to make Main method async

var tasks = _words.Select(CheckRelationShipsAsync);
var task = Task.WhenAll(tasks);
task.Wait();

or

static void  Main()
{
    var tasks = _words.Select(CheckRelationShipsAsync);
    Task.WaitAll(tasks.ToArray());
}

Second point is that when you check the relationships you haven't skip the word itself, and every word at the end has relationship with itself. You should add leftWord != rightWord condition inside foreach loop to get an expected result

The expected result would be that "Space" has no relationships whereas the words "Bull" and "Beef" have one relationship to one another.

Pavel Anikhouski
  • 21,776
  • 12
  • 51
  • 66
3

Your algorithm has an O(n^2) time complexity. This is a problem if you have a great number of items to compare against each other. E.g., if you have 1000 items, this gives you 1000 * 1000 = 1000000 (one million) comparisons.

Consider using another approach. I don't know if this is applicable to your real problem, but for this example, assuming that each word starts with a capital letter A..Z, you could store the related words by first letter in an array of length 26 of word lists.

var a = new List<Word>[26];

// Initialize array with empty lists
for (int i = 0; i < a.Length; i++) {
    a[i] = new List<Word>();
}

// Fill array with related words
foreach (var word in _words) {
    a[word.Text[0] - 'A'].Add(word); // Subtracting 'A' yields a zero-based index.
}

Note that your original solution has two nested loops (where one is hidden inside the call of CheckRelationShipsAsnc). This solution has only one level of loops and has a time complexity of O(n) up to here.

Now you find all related words in one list at the corresponding array positions. Taking this information, you can now wire up the words being in the same list. This part is still O(n^2); however, here n is much smaller, because is refers only to words being in the lists that are related, but not the length of the initial _words array.

Depending on how your real problem is formulated, it may be better to use a Dictionary<char, List<Word>> in place of the my array a. The array solution requires an index. In a real world problem, the relation condition may not be formulateable as an index. The dictionary requires a key and any kind of object can be used as key. See: Remarks section of Dictionary<TKey,TValue> Class.

An algorithm optimized in this way may be even faster than a multitaskting solution.

Olivier Jacot-Descombes
  • 104,806
  • 13
  • 138
  • 188