15

I have a project in Asp.Net Core. This project has a ICacheService as below:

public interface ICacheService
{
    T Get<T>(string key);
    T Get<T>(string key, Func<T> getdata);
    Task<T> Get<T>(string key, Func<Task<T>> getdata); 
    void AddOrUpdate(string key, object value);
} 

The implementation is simply based on ConcurrentDictionary<string, object>, so its not that complicated, just storing and retrieving data from this dictionary. At one of my services I have a method as below:

public async Task<List<LanguageInfoModel>> GetLanguagesAsync(string frontendId, string languageId, string accessId) 
{
    async Task<List<LanguageInfoModel>> GetLanguageInfoModel()
    {
        var data = await _commonServiceProxy.GetLanguages(frontendId, languageId, accessId);
        return data;
    }

    _scheduler.ScheduleAsync($"{CacheKeys.Jobs.LanguagesJob}_{frontendId}_{languageId}_{accessId}", async () =>
    {
        _cacheService.AddOrUpdate($"{CacheKeys.Languages}_{frontendId}_{languageId}_{accessId}", await GetLanguageInfoModel());
        return JobStatus.Success;
    }, TimeSpan.FromMinutes(5.0));

    return await _cacheService.Get($"{CacheKeys.Languages}_{frontendId}_{languageId}_{accessId}", async () => await GetLanguageInfoModel());
}

The problem is that I have three params in this method that I use as a cache key. This works fine but the problem is that the combination of three params is pretty high so there will be so many duplication of objects in cache. I was thinking to create a cache without duplication like below:

To have a cache with a list as a key where I can store more than one key for one object. So when I get new elements I will check for each of them if it is in the cache, if it is in the cache I will only add a key in the key list otherwise insert a new element in the cache. The problem here is that testing if an object is in the cache is a big problem. I think it will consume a lot of resources and would need some serialization into a specific form to make the comparison possible which will make again the comparison consuming a lot of resources. The cache might look something like this CustomDictionary<List<string>, object>

Does anybody know a good approach of solving this issue to not duplicate objects in the cache ?

EDIT 1:

My main concern is when I retrieve List<MyModel> from my webservices because they might have 80% of the objects with the same data which will drastically increase the size in memory. But this would be relevant for simple cases as well. Lest suppose I have something like this:

MyClass o1 = new MyObject();
_cache.Set("key1", o1);
_cashe.Set("key2", o1);

In this case when trying to add the same object twice I would like to not duplicate it but to have key2 somehow pointing to the same object as key1. If this achieved it will be problem to invalidate them but I expect to have something like this:

_cache.Invalidate("key2");

This will check if there is another key pointing to same object. If so, it will only remove the key otherwise destroy the object itself.

vasil oreshenski
  • 2,788
  • 1
  • 14
  • 21
Rey
  • 3,663
  • 3
  • 32
  • 55
  • 4
    Why don't use built-in in-memory cache? – Alex Riabov Jul 12 '18 at 14:29
  • 1
    Because for my needs ConcurrentDictionary its enough, another reason is because I use jobs to update it and as far as I can see it was a good solution to me. If built-in memory cache has some advantages for my problem i will easily switch to it – Rey Jul 12 '18 at 14:31
  • 1
    If I understand correctly you want to store only one object for all combination of `$"{CacheKeys.Languages}_{frontendId}_{languageId}_{accessId}"` in your cache, so all of them has same data. So why don't you just use a cons string for your cache? Why it needs to be different if the value is the same? – Kahbazi Jul 20 '18 at 14:03
  • 1
    You mean that GetLanguageInfoModel() for frontendid=1, languageid=2 and accessid=3 can produce the same model for completely different values for the three params ? If so how GetLanguageInfoModel() is related to the three params? Cache key represents an object state in unique manner ... Probably the key generation is not very correct for the current case ? – vasil oreshenski Jul 20 '18 at 14:05
  • @vasiloreshenski this method calls a web service which is not controlled anyhow by me and dont have much info about its logic. What I want to do is caching data retrieved by this service so I wont have to call it very often and for performance issues as well. This service return a `List` which might have duplicated `MyModel` value and I dont want to store lists with duplicated values inside. I thought to create a list which will contain all elements. If duplicated I will only have two keys for same object rather duplicating both objects with different keys – Rey Jul 20 '18 at 14:11
  • If being in c++ world I could store pointers to objects rather than duplicating objects but in c# dont know which is best approach without using pointers – Rey Jul 20 '18 at 14:12
  • To distinct the result of the webservice (List) check this for more info http://blog.alex-turok.com/2013/03/c-linq-and-iequalitycomparer.html. You need to call the .Distinct(..) method overload of the list with custom impl. of comparer class. It is alot easier and it is the correct to eliminate the duplicates. – vasil oreshenski Jul 20 '18 at 14:15
  • @Kahbazi the values are not ment to be the same. My main concern is when `List<>` retrieved because the list might contain same object which I dont want – Rey Jul 20 '18 at 14:16
  • This can also be helpful for comparers - https://stackoverflow.com/questions/6694508/how-to-use-the-iequalitycomparer – vasil oreshenski Jul 20 '18 at 14:23
  • 2
    @RajmondBurgaj storing reference in c# is exactly same as pointing to same object in c++. IMemoryCache in .net core does everything you want. While storing you can store reference of an existing object as well, IMemoryCache might just wrongly calculate its memory foot print, but you will not have duplicate objects. When you are storing o1 for key1 and key2, who told you that they are duplicated? In this case, it is only storing reference to original o1, it is not duplicating. – Akash Kava Jul 24 '18 at 07:19
  • What size about the objects that returned by remote service? taking time to hash these object for comparing later is an issue to you? – Dan Nguyen Jul 26 '18 at 07:10
  • @DanNguyen yes it is an issue because the webservice will be called so often and it might contain so many items. Furthermore there are jobs running in the background all the time to update the cache – Rey Jul 26 '18 at 07:24
  • In the first call to remote service, let say the parametes are 1,2,3, you get a list of objects. one of them is A with Id = 1 and some data. You add these to the cache with key is 1_2_3. In the second calling with 1,2,4, you get another list that contains A' wiht Id = 1 and some A' data has changed (same Id but some other properties have been changed by the remote party). What do you want the cache handle this? Remove A and keep A' or keep both of them? – Dan Nguyen Jul 26 '18 at 07:42
  • @DanNguyen in that case I want to remove A and keep only A' because I want to be synced with remote server update – Rey Jul 26 '18 at 07:48

6 Answers6

7

Maybe we could reformulate this problem to two separate issues ...

  1. executing the call for each combination and
  2. storing n times the identical result, wasting tons of memory

For 1 I don't have any idea how we could prevent it, as we do not know prior to execution if we will fetch a duplicate in this setup. We would need more information that is based on when these values vary, which may or may not be possible.

For 2 one solution would be to override hashcode so it is based on the actual returned values. A good solution would be generic and walk through the object tree (which probably can be expensive). Would like to know if there are any pre-made solutions for this actually.

FrankyBoy
  • 1,865
  • 2
  • 18
  • 32
4

This answer is specifically for returning List<TItem>s, rather than just individual TItems, and it avoids duplication of any TItem as well as any List<T>. It uses arrays, because you're trying to save memory, and arrays will use less than a List.

Note that for this (and any solution really) to work, you MUST override Equals and GetHashCode on TItem, so that it knows what a duplicate item is. (Unless the data provider is returning the same object each time, which is unlikely.) If you don't have control of TItem, but you can yourself determine whether two TItems are equal, you can use an IEqualityComparer to do this, but the below solution would need to be modified very slightly in order to do that.

View the solution with a basic test at: https://dotnetfiddle.net/pKHLQP

public class DuplicateFreeCache<TKey, TItem> where TItem : class
{
    private ConcurrentDictionary<TKey, int> Primary { get; } = new ConcurrentDictionary<TKey, int>();
    private List<TItem> ItemList { get; } = new List<TItem>();
    private List<TItem[]> ListList { get; } = new List<TItem[]>();
    private Dictionary<TItem, int> ItemDict { get; } = new Dictionary<TItem, int>();
    private Dictionary<IntArray, int> ListDict { get; } = new Dictionary<IntArray, int>();

    public IReadOnlyList<TItem> GetOrAdd(TKey key, Func<TKey, IEnumerable<TItem>> getFunc)
    {
        int index = Primary.GetOrAdd(key, k =>
        {
            var rawList = getFunc(k);

            lock (Primary)
            {
                int[] itemListByIndex = rawList.Select(item =>
                {
                    if (!ItemDict.TryGetValue(item, out int itemIndex))
                    {
                        itemIndex = ItemList.Count;
                        ItemList.Add(item);
                        ItemDict[item] = itemIndex;
                    }
                    return itemIndex;
                }).ToArray();

                var intArray = new IntArray(itemListByIndex);

                if (!ListDict.TryGetValue(intArray, out int listIndex))
                {
                    lock (ListList)
                    {
                        listIndex = ListList.Count;
                        ListList.Add(itemListByIndex.Select(ii => ItemList[ii]).ToArray());
                    }
                    ListDict[intArray] = listIndex;
                }

                return listIndex;
            }
        });

        lock (ListList)
        {
            return ListList[index];
        }
    }


    public override string ToString()
    {
        StringBuilder sb = new StringBuilder();
        sb.AppendLine($"A cache with:");
        sb.AppendLine($"{ItemList.Count} unique Items;");
        sb.AppendLine($"{ListList.Count} unique lists of Items;");
        sb.AppendLine($"{Primary.Count} primary dictionary items;");
        sb.AppendLine($"{ItemDict.Count} item dictionary items;");
        sb.AppendLine($"{ListDict.Count} list dictionary items;");
        return sb.ToString();
    }

    //We have this to make Dictionary lookups on int[] find identical arrays.
    //One could also just make an IEqualityComparer, but I felt like doing it this way.
    public class IntArray
    {
        private readonly int _hashCode;
        public int[] Array { get; }
        public IntArray(int[] arr)
        {
            Array = arr;
            unchecked
            {
                _hashCode = 0;
                for (int i = 0; i < arr.Length; i++)
                    _hashCode = (_hashCode * 397) ^ arr[i];
            }
        }

        protected bool Equals(IntArray other)
        {
            return Array.SequenceEqual(other.Array);
        }

        public override bool Equals(object obj)
        {
            if (ReferenceEquals(null, obj)) return false;
            if (ReferenceEquals(this, obj)) return true;
            if (obj.GetType() != this.GetType()) return false;
            return Equals((IntArray)obj);
        }

        public override int GetHashCode() => _hashCode;
    }
}

It occurred to me that a ReaderWriterLockSlim would be better than the lock(ListList), if the lock is causing performance to lag, but it's very slightly more complicated.

MineR
  • 2,144
  • 12
  • 18
3

Similar to @MineR, this solution is performing a 'double caching' operation: it caches the key'ed lists (lookups) as well as the individual objects - performing an automatic deduplication.

It is a fairly simple solution using two ConcurrentDictionaries - one acting as a HashSet and one as a keyed lookup. This allows most of the threading concerns to be handled by the framework.

You can also pass in and share the hashset between multiple Cachedlookups allowing lookups with different keys.

Note that object equality or an IEqualityComparer are required to make any such solution function.

Class:

public class CachedLookup<T, TKey>
{        
    private readonly ConcurrentDictionary<T, T> _hashSet;
    private readonly ConcurrentDictionary<TKey, List<T>> _lookup = new ConcurrentDictionary<TKey, List<T>>();

    public CachedLookup(ConcurrentDictionary<T, T> hashSet)
    {
        _hashSet = hashSet;
    }   

    public CachedLookup(IEqualityComparer<T> equalityComparer = default)
    {
        _hashSet = equalityComparer is null ? new ConcurrentDictionary<T, T>() : new ConcurrentDictionary<T, T>(equalityComparer);
    }

    public List<T> Get(TKey key) => _lookup.ContainsKey(key) ? _lookup[key] : null;

    public List<T> Get(TKey key, Func<TKey, List<T>> getData)
    {
        if (_lookup.ContainsKey(key))
            return _lookup[key];

        var result = DedupeAndCache(getData(key));

        _lookup.TryAdd(key, result);

        return result;
    }
    public async ValueTask<List<T>> GetAsync(TKey key, Func<TKey, Task<List<T>>> getData)
    {
        if (_lookup.ContainsKey(key))
            return _lookup[key];

        var result = DedupeAndCache(await getData(key));

        _lookup.TryAdd(key, result);

        return result;
    }

    public void Add(T value) => _hashSet.TryAdd(value, value);

    public List<T> AddOrUpdate(TKey key, List<T> data)
    {            
        var deduped = DedupeAndCache(data);

        _lookup.AddOrUpdate(key, deduped, (k,l)=>deduped);

        return deduped;
    }

    private List<T> DedupeAndCache(IEnumerable<T> input) => input.Select(v => _hashSet.GetOrAdd(v,v)).ToList();
}

Example Usage:

public class ExampleUsage
{
    private readonly CachedLookup<LanguageInfoModel, (string frontendId, string languageId, string accessId)> _lookup 
        = new CachedLookup<LanguageInfoModel, (string frontendId, string languageId, string accessId)>(new LanguageInfoModelComparer());

    public ValueTask<List<LanguageInfoModel>> GetLanguagesAsync(string frontendId, string languageId, string accessId)
    {
        return _lookup.GetAsync((frontendId, languageId, accessId), GetLanguagesFromDB(k));
    }

    private async Task<List<LanguageInfoModel>> GetLanguagesFromDB((string frontendId, string languageId, string accessId) key) => throw new NotImplementedException();
}

public class LanguageInfoModel
{
    public string FrontendId { get; set; }
    public string LanguageId { get; set; }
    public string AccessId { get; set; }
    public string SomeOtherUniqueValue { get; set; }
}

public class LanguageInfoModelComparer : IEqualityComparer<LanguageInfoModel>
{
    public bool Equals(LanguageInfoModel x, LanguageInfoModel y)
    {
        return (x?.FrontendId, x?.AccessId, x?.LanguageId, x?.SomeOtherUniqueValue)
            .Equals((y?.FrontendId, y?.AccessId, y?.LanguageId, y?.SomeOtherUniqueValue));
    }

    public int GetHashCode(LanguageInfoModel obj) => 
        (obj.FrontendId, obj.LanguageId, obj.AccessId, obj.SomeOtherUniqueValue).GetHashCode();
}

Notes:

The CachedLookup class is generic on both the value and key. The example use of ValueTuple makes it easy to have compound keys. I have also used ValueTuples to simplify the equality comparisons.

This usage of ValueTask fits nicely with its intended purpose, returning the cached list synchronously.

If you have access to the lower level data access layer, one optimization would be to move the deduplication to happen before the objects are instantiated (based on property value equality). This would reduce the allocations and load on the GC.

Andrew Hanlon
  • 7,271
  • 4
  • 33
  • 53
1

If you have control over your complete solution then you can do something like this.

  1. Whatever object that is capable of storing in Cache. You have to identify that. All Such object implement common interface.

    public interface ICacheable 
    {
        string ObjectId(); // This will implement logic to calculate each object identity. You can count hash code but you have to add some other value to.
    }
    
  2. Now when you store object in Cache. You do two thing.

    • Store Two way things. Like one cache store ObjectId to Key.
    • Another will contains ObjectId to Object.

    • Overall idea is that when you get object. You search in first cache and see that the key you want is there against ObjectId. If yes then no further action otherwise you have to create new entry in First Cache for ObjectId to Key Map.

    • If object is not present then you have to create entry in both cache

Note : You have to overcome performance issue. Because your keys is some kind of list so it create problem while searching.

dotnetstep
  • 17,065
  • 5
  • 54
  • 72
  • 1
    thanks for your info. What about if I wanted to store a List ? – Rey Jul 20 '18 at 13:56
  • 1
    @RajmondBurgaj - then you would build a wrapper over this list and again implement the ICacheable interface. Note that if you still need atomicity for set and get, with two caches, you'll probably have to wrap these methods using `lock` or some other sync mechanism. – Simon Mourier Jul 21 '18 at 05:51
1

It sound to me as though you need to implement some sort of index. Assuming that your model is fairly large, which is why you want to save memory then you could do this with two concurrent dictionaries.

The first would be ConcurrentDictionary<string, int> (or whatever unique id applies to your model object) and would contain your key values. Each key is obviously be different as per all your combinations, but you are only duplicating the int unique key for all of your objects, not the entire object.

The second dictionary would be a ConcurrentDictionary<int, object> or ConcurrentDictionary<int, T> and would contain your unique large objects indexed via their unique key.

When building the cache you would need to populate both dictionaries, the exact method would depend upon how you are doing it at the moment.

To retrieve an object you would build the key as you do at the moment, retrieve the hashcode value from the first dictionary, and then use that to locate the actual object from the second dictionary.

It is also possible to invalidate one key without invalidating the main object another key is also using it, although it does require you to iterate over the index dictionary to check if any other key is pointing to the same object.

ste-fu
  • 6,879
  • 3
  • 27
  • 46
  • 2
    You cannot use the hashcode as a unique identifier for any old object. You're going to have hash collisions, and then this all falls apart. – MineR Jul 25 '18 at 10:52
  • 1
    @MineR Fair point if there are lots and lots of them, they are not guaranteed unique. Will edit to make it clear there should be a unique id for the model. – ste-fu Jul 25 '18 at 12:44
1

I think this is not a caching concern where one key map to one and only one data. Yours is not in this case. You are trying to manipulate a local data repository in memory work as cached data. You are trying to create mappers between keys and objects that loaded from remote. One key is able to map to many objects. One object can be mapped by many Keys, so the relationship is n <======> n

I have created a sample modal as following

enter image description here

Key, KeyMyModel and MyModel are classes for caching handler RemoteModel is class that you got from remote service

With this models, you are able to meet the requirements. This utilizes entity Id to specify an object, does not need to hash to specify duplications. This is very basic that I have implemented set method. Invaildate a key is very similar. You must write code that ensure thread safe as well

public class MyModel
    {
        public RemoteModel RemoteModel { get; set; }
        public List<KeyMyModel> KeyMyModels { get; set; }
    }
    public class RemoteModel
    {
        public string Id { get; set; } // Identity property this get from remote service
        public string DummyProperty { get; set; } // Some properties returned by remote service
    }
    public class KeyMyModel
    {
        public string Key { get; set; }
        public string MyModelId { get; set; }
    }
    public class Key
    {
        public string KeyStr { get; set; }
        public List<KeyMyModel> KeyMyModels { get; set; }
    }

    public interface ICacheService
    {
        List<RemoteModel> Get(string key);
        List<RemoteModel> Get(string key, Func<List<RemoteModel>> getdata);
        Task<List<RemoteModel>> Get(string key, Func<Task<List<RemoteModel>>> getdata);
        void AddOrUpdate(string key, object value);
    }

    public class CacheService : ICacheService
    {
        public List<MyModel> MyModels { get; private set; }
        public List<Key> Keys { get; private set; }
        public List<KeyMyModel> KeyMyModels { get; private set; }

        public CacheService()
        {
            MyModels = new List<MyModel>();
            Keys = new List<Key>();
            KeyMyModels = new List<KeyMyModel>();
        }
        public List<RemoteModel> Get(string key)
        {
            return MyModels.Where(s => s.KeyMyModels.Any(t => t.Key == key)).Select(s => s.RemoteModel).ToList();
        }

        public List<RemoteModel> Get(string key, Func<List<RemoteModel>> getdata)
        {
            var remoteData = getdata();
            Set(key, remoteData);

            return MyModels.Where(s => s.KeyMyModels.Any(t => t.Key == key)).Select(t => t.RemoteModel).ToList();
        }

        public Task<List<RemoteModel>> Get(string key, Func<Task<List<RemoteModel>>> getdata)
        {
            throw new NotImplementedException();
        }

        public void AddOrUpdate(string key, object value)
        {
            throw new NotImplementedException();
        }

        public void Invalidate(string key)
        {

        }

        public void Set(string key, List<RemoteModel> data)
        {
            var Key = Keys.FirstOrDefault(s => s.KeyStr == key) ?? new Key()
            {
                KeyStr = key
            };

            foreach (var remoteModel in data)
            {
                var exist = MyModels.FirstOrDefault(s => s.RemoteModel.Id == remoteModel.Id);
                if (exist == null)
                {
                    // add data to the cache
                    var myModel = new MyModel()
                    {
                        RemoteModel = remoteModel
                    };
                    var keyMyModel = new KeyMyModel()
                    {
                        Key = key,
                        MyModelId = remoteModel.Id
                    };
                    myModel.KeyMyModels.Add(keyMyModel);
                    Key.KeyMyModels.Add(keyMyModel);
                    Keys.Add(Key);
                }
                else
                {
                    exist.RemoteModel = remoteModel;
                    var existKeyMyModel =
                        KeyMyModels.FirstOrDefault(s => s.Key == key && s.MyModelId == exist.RemoteModel.Id);
                    if (existKeyMyModel == null)
                    {
                        existKeyMyModel = new KeyMyModel()
                        {
                            Key = key,
                            MyModelId = exist.RemoteModel.Id
                        };
                        Key.KeyMyModels.Add(existKeyMyModel);
                        exist.KeyMyModels.Add(existKeyMyModel);
                        KeyMyModels.Add(existKeyMyModel);
                    }
                }
            }

            // Remove MyModels if need
            var remoteIds = data.Select(s => s.Id);
            var currentIds = KeyMyModels.Where(s => s.Key == key).Select(s => s.MyModelId);
            var removingIds = currentIds.Except(remoteIds);
            var removingKeyMyModels = KeyMyModels.Where(s => s.Key == key && removingIds.Any(i => i == s.MyModelId)).ToList();
            removingKeyMyModels.ForEach(s =>
            {
                KeyMyModels.Remove(s);
                Key.KeyMyModels.Remove(s);
            });
        }
    }

    class CacheConsumer
    {
        private readonly CacheService _cacheService = new CacheService();

        public List<RemoteModel> GetMyModels(string frontendId, string languageId, string accessId)
        {
            var key = $"{frontendId}_{languageId}_{accessId}";
            return _cacheService.Get(key, () =>
            {
                // call to remote service here
                return new List<RemoteModel>();
            });
        }
    }
Dan Nguyen
  • 1,308
  • 6
  • 17