I have a collection of objects where every object has a unique string Id, and any other object can contain entirely arbitrary (many-to-one) "Links" to another object. I also want to be able to generate a "usage map" which is the reverse index -- given any one object, which other objects are linked either directly to it or to a child of it? (Here a "child" is defined as any object with a matching prefix Id, as the Id is a dotted-path notation.)
So Baz.Boz
might be one object that links to Foo.Bar
-- the usage map should then reflect that both Foo
and Foo.Bar
(but not Foo.Bob
) are used by Baz.Boz
.
This is the code used to calculate the usage map:
// builds Id => links that link to Id or one of Id's children (by prefix)
public IDictionary<string, IList<Link>> CalculateUsageMap()
{
var all = All();
var links = all.Values
.SelectMany(o => o.Links ?? Enumerable.Empty<Link>())
.ToList();
return all.Keys.ToDictionary(i => i, i => links.Where(k => IsLinkedTo(k, i)).ToList());
// this last line is very slow
}
private static bool IsLinkedTo(Link link, string candidateId)
{
return !string.IsNullOrEmpty(link.TargetId)
&& !string.IsNullOrEmpty(candidateId)
&& link.TargetId.StartsWith(candidateId, StringComparison.Ordinal);
}
This is the supporting structure behind it:
public interface ILinkable
{
string Id { get; }
IEnumerable<ILinkable> Children { get; }
IEnumerable<Link> Links { get; }
}
public class Link
{
public string Name { get; }
public ILinkable Source { get; } // our immediate owner
public string TargetId { get; }
// plus constructors etc that's irrelevant at present
}
public ILinkable Root { get; }
public IDictionary<string, ILinkable> All()
{
var tree = new Dictionary<string, ILinkable>();
AddWithDescendants(tree, Root);
return tree;
}
private static void AddWithDescendants(IDictionary<string, ILinkable> tree, ILinkable obj)
{
tree.Add(obj.Id, obj);
foreach (var child in obj.Children ?? Enumerable.Empty<ILinkable>())
{
AddWithDescendants(tree, child);
}
}
This works, but in a tree with ~14k objects and ~3k links (producing ~20k usages) this takes ~5s to generate, which is longer than I'd like. (I've checked and All()
and calculating links
takes basically no time; it's all being spent inside ToDictionary
.)
Is there some way to improve performance of this line? My first thought was using something like GroupJoin
, but since we're "joining" on prefix-equality rather than actual equality that doesn't really work. I would prefer to keep this in pure code, not involving a database.
(I did attempt to write a custom equality comparer for GroupJoin
but this ended up being both slower and producing the wrong results, with only ~7k of usage output. And it's dubious anyway since this is an asymmetric match, while equality comparers assume symmetry.)