1

Working with the PriorityQueue<TElement, TPriority> collection, frequently I have the need to preserve the insertion order of elements with the same priority (FIFO order), and AFAIK the only way to do it is by coupling the TPriority with an extra incremented int index: PriorityQueue<TElement, (TPriority, int)>. This works well, but it makes me feel uncomfortable knowing that eventually, if the queue is used for an extensive amount of time, the int indices will wrap around the Int32.MaxValue limit, breaking the FIFO functionality. I can patch this problem by switching from the int to a long, making it practically impossible for the index to wrap, but still feels dirty. I wonder if it's possible to do better, by rebuilding the indices of the TElement+TPriority pairs when the index is about to wrap with the next Enqueue operation. So I wrote this extension method for indexed FIFO priority queues:

public static void Reindex<TElement, TPriority>(
    this PriorityQueue<TElement, (TPriority, int)> source)
{
    ArgumentNullException.ThrowIfNull(source);
    (TElement, (TPriority, int))[] entries = source.UnorderedItems
        .OrderBy(e => e.Priority, source.Comparer)
        .ToArray();
    source.Clear();
    for (int i = 0; i < entries.Length; i++)
        source.Enqueue(entries[i].Item1, (entries[i].Item2.Item1, i));
}

My problem with this implementation is that it allocates a disproportional amount of memory during the re-indexing. For example 61,456 bytes are allocated for rebuilding the indices of a PriorityQueue<object, (char, int)> with 1,000 elements:

PriorityQueue<object, (char, int)> queue = new(Comparer<(char, int)>.Create((x, y) =>
{
    int result = x.Item1.CompareTo(y.Item1);
    if (result == 0) result = x.Item2.CompareTo(y.Item2);
    return result;
}));

Random random = new(0);
for (int i = 0; i < 100_000; i++)
{
    char key = (char)random.Next(65, 91);
    queue.Enqueue(new object(), (key, i));
}
while (queue.Count > 1_000) queue.Dequeue();

long mem0 = GC.GetTotalAllocatedBytes(true);
queue.Reindex();
long mem1 = GC.GetTotalAllocatedBytes(true);
Console.WriteLine($"Count: {queue.Count:#,0}, Allocated: {mem1 - mem0:#,0} bytes");

Output:

Count: 1,000, Allocated: 61,456 bytes

Live demo.

I would like to ask if it's possible to rebuild the indices with zero allocations (in-place), or at least with no more than System.Runtime.CompilerServices.Unsafe.SizeOf<(TElement, TPriority)>() * queue.Count allocated bytes (16,000 bytes in the above example).

Theodor Zoulias
  • 34,835
  • 7
  • 69
  • 104
  • Most implementations do just use the ever-incrementing "timestamp" as part of the priority calculation, as per your code. You might find [this discussion](https://cstheory.stackexchange.com/questions/593/is-there-a-stable-heap) interesting. – Matthew Watson Apr 05 '23 at 14:49
  • @MatthewWatson thanks for the interesting [link](https://cstheory.stackexchange.com/questions/593/is-there-a-stable-heap). Regarding the statement that most implementations just use an ever-incrementing timestamp, I would say that, assuming it's true, most implementations are broken! – Theodor Zoulias Apr 05 '23 at 15:01
  • You mean broken because the timestamp will wrap around eventually? Yeah, I guess they use `long` like you did. – Matthew Watson Apr 05 '23 at 15:29
  • That sounds like a pretty strict definition of "broken". There are ~3x as many positive `long` values as it would take to represent every "Tick" from now until the year 9999, and on my machine it consistently takes at least a tick to put an item into a PriorityQueue. I'd say you're fine just using a `long`, and if that "feels dirty" then you can detect an overflow and only re-index when that (never) happens. – StriplingWarrior Apr 05 '23 at 16:30
  • @MatthewWatson assuming an extreme frequency of 1,000,000,000 `Enqueue` operations per second, it would take almost 300 years to wrap a `long` value, which makes it practically safe. But in order to get this statistical safety you have to increase the per element state by 4 bytes, which is not ideal. And this question transcends practicality. It's about ideals! :-) – Theodor Zoulias Apr 05 '23 at 16:41

2 Answers2

2

I was just playing around with this puzzle. I'm not an expert. Actually i never used a PriorityQueue. But the TryDequeue would reduce the allocations somewhere around 16k bytes

Count: 1.000, Allocated: 16.152 bytes

because it allocates only one item at a time. I can not say anything about the performance thought.

 public static void Reindex2<TElement, TPriority>(
                this PriorityQueue<TElement, (TPriority, int)> source)
            {
                ArgumentNullException.ThrowIfNull(source);
    
                int count = source.Count;
                var orderedItems = new (TElement, (TPriority, int))[count];
    
                int i = 0;
                while (source.TryDequeue(out var item, out var priority))
                {
                    orderedItems[i] = (item, (priority.Item1, priority.Item2));
                    i++;
                }
    
                Array.Sort(orderedItems, (a, b) 
                    => source.Comparer.Compare(a.Item2, b.Item2));
    
                for (i = 0; i < count; i++)
                {
                    source.Enqueue(orderedItems[i].Item1, (orderedItems[i].Item2.Item1, i));
                }
            }
jeb
  • 848
  • 5
  • 16
2

Based on my tests the biggest memory consumer was the LINQ with ordering (with or without materialization), so just copying the values into array and sorting with Array.Sort reduced the memory consumption drastically:

public static void ReindexNew<TElement, TPriority>(this PriorityQueue<TElement, (TPriority, int)> source)
{
    ArgumentNullException.ThrowIfNull(source);
    (TElement, (TPriority, int))[] entries = new (TElement, (TPriority, int))[source.UnorderedItems.Count];
    int counter = 0;
    foreach (var item in source.UnorderedItems)
    {
        entries[counter++] = item;
    }
    // hope got comparisons right
    Array.Sort(entries, (left, right) => source.Comparer.Compare(left.Item2, right.Item2)); 
    source.Clear();
    for (int i = 0; i < entries.Length; i++)
    {
        entries[i].Item2.Item2 = i;
    }
    source.EnqueueRange(entries);

    // or something like this, requires time benchmarking 
    // for (int i = 0; i < entries.Length; i++)
    // {
    //     source.Enqueue(entries[i].Item1, (entries[i].Item2.Item1, i));
    // }
}

And "tests":

void Test()
{
    long mem0 = GC.GetTotalAllocatedBytes(true);
    queue.ReindexNew();
    long mem1 = GC.GetTotalAllocatedBytes(true);
    Console.WriteLine($"Count: {queue.Count:#,0}, Allocated: {mem1 - mem0:#,0} bytes");
}

Test(); // Count: 1,000, Allocated: 16,136 bytes
Test(); // Count: 1,000, Allocated: 16,112 bytes
Guru Stron
  • 102,774
  • 10
  • 95
  • 132
  • 1
    This is also a great answer! I think you could optimize the line `entries[i] = (entries[i].Item1, (entries[i].Item2.Item1, i));` with `entries[i].Item2.Item2 = i;`. Value-tuples are mutable. – Theodor Zoulias Apr 05 '23 at 18:10
  • 1
    @TheodorZoulias Updated! Was thinking about that but originally went with "safe" approach and then forgot to check it out. – Guru Stron Apr 05 '23 at 18:12
  • I am accepting this answer, although [Jeb's](https://stackoverflow.com/a/75941471/11178549) answer is also meeting the requirements, because this one is more performant. – Theodor Zoulias Apr 05 '23 at 18:19