I am struggling with this problem and would really appreciate any help.
I'm working on an existing project. I have added logic that counts values combinations, and makes sure we do not pass some limit. For example, given this data table columns:
Name|Age|description
The code makes sure we do not have more than K combinations of Name, Age. I have data that contains something like million pairs like this. At some point, the program just crashes, or get stuck, though I do not see any memory problem or CPU issue.
I implemented this limit using ConcurrentDictionary
of tuples (Name, Age) as keys, and I'm using C# .NET 6 ..
I can see that the time it takes to try and add an element to the DS becomes really long at some point.
Edit: adding some code pieces, though it is a lot of inner implementation, I do believe these are the main code parts to understand the problem:
here is the component that's responsible for limiting the keys:
protected override Result Process(Row row)
{
var valueToLimit = GetValueToLimit(row);
var result = _values.TryAdd(valueToLimit);
}
// some logic related to the case of crossing the limit
return Result.Success;
}
protected abstract T GetValueToLimit(Row row);
}
The function GetValueToLimit is implemented for my case :
protected override string[] GetValueToLimit(Row row)
{ // takes the relevant values from an input record, according to the requested columns.
return _columnIndices.Select(x => row.GetValue(x)).ToArray();
}
and finally, here is some parts of the concurrent HashSet implementation:
public class BoundedConcurrentHashSet<K> : ConcurrentHashSet<K>
{
..
public override Result TryAdd(K element)
{
if (Dictionary.Count() < _maxCapacity)
{
return base.TryAdd(element);
}
else
{
return Contains(element) ? Result.AlreadyInHash : Result.ExceedsCapacity;
}
}
where concurrentHashSet is implemented with C# concurrentDictionary:
public class ConcurrentHashSet<K>
{
public ConcurrentHashSet(IEqualityComparer<K> equalityComparer)
{
Dictionary = new ConcurrentDictionary<K, object>(equalityComparer);
}
protected ConcurrentDictionary<K, object> Dictionary { get; }
public int Count => Dictionary.Count;
public IEnumerable<K> Elements => Dictionary.Keys;
public virtual Result TryAdd(K element)
{
return Dictionary.TryAdd(element, null) ? dResult.Added : Result.AlreadyInHash;
}
public bool Contains(K element)
{
return Dictionary.ContainsKey(element);
}
Please share any idea that can help.
Thanks