The following code fails multiple enumeration because the existingNames
hash set still contains the results of the last enumeration, thus the numeric suffixes are advanced more than is correct. What's an elegant way to soup up this method so that it works correctly upon multiple enumeration?
public static IEnumerable<TOutput> UniquifyNames<TSource, TOutput>(
this IEnumerable<TSource> source,
Func<TSource, string> nameSelector,
Func<TSource, string, TOutput> resultProjection
) {
HashSet<string> existingNames = new HashSet<string>();
return source
.Select(item => {
string name = nameSelector(item);
return resultProjection(
item,
Enumerable.Range(1, int.MaxValue)
.Select(i => {
string suffix = i == 1
? ""
: (name.EndsWithDigit() ? "-" : "") + i.ToString();
return $@"{name}{suffix}";
})
.First(candidateName => existingNames.Add(candidateName))
);
});
}
private static bool EndsWithDigit(this string value)
=> !string.IsNullOrEmpty(value) && "0123456789".Contains(value[value.Length - 1]);
I thought about creating an extension method such as UponEnumeration
to wrap the outer enumerable, which would take a callback Action
to run when enumeration began again (and which could be used to reset the HashSet
). Is that a good idea?
I just realized it's not a good idea as stated, because the same resulting IEnumerable
could be enumerated by different classes at the same time (begin enumerating in one place, while the other was still partway through enumeration, so things would break after resuming enumeration, because the HashSet
got cleared). It sounds like the best thing to do is simply ToList()
but I really would like to preserve lazy evaluation if possible.