1

I've got a little problem here and I'd like some help.

My code is an infinite search on web pages for patterns that, whenever it finds something new, it writes it on a file.

However, sometimes, the info I'm scavenging is already in the file, but it's not updated and I don't want repeated entries on my file.

Therefore, I simply created a List of strings, adding each entry there and every time the code finds what it's looking for, it checks if the string is already on that list before writing to the file.

You can clearly see why this is a bad idea... Since it runs 24/7, this list will endlessly become bigger. But there is a catch. I'm 100% sure that the info I'm looking will never repeat if 15 minutes has passed.

So, what I really want is to eliminate items that are on this list for 15 minutes. I just can't think of something simple and/or elegant to do this. Or, I don't know if there is some data structure or library that can solve this for me.

That's why I'm asking here: what is the best solution to create some kind of "timed list", where items that are there for a while gets removed at the end of the iteration?

Thanks in advance.

Rann Lifshitz
  • 4,040
  • 4
  • 22
  • 42

3 Answers3

2

Have you tried .NET's built-in MemoryCache?

You can set a cache policy that includes an absolute timeout, which I think is what you want.

Zer0
  • 7,191
  • 1
  • 20
  • 34
0

You'll need something running that periodically prunes the list.

What I've done in the past is:

  1. Use a ConcurrentBag<Tuple<DateTime, T>> instead of List<T>
  2. With the bag of Tuples, store the object and the time it was added: theBag.Add(Tuple.Create(DateTime.Now, myObject));
  3. Run a secondary thread that periodically enumerates the bag, and removes any entries that have "expired".

This is a more active approach, but its pretty simple. However, since you're now working with two threads you've got to be careful to not deadlock. Thats why I used something like ConcurrentBag. There are other Concurrent collections you can look at as well. You mentioned a queue, so you could try a ConcurrentQueue

Take a good look at a caching library like others have suggested and weigh your options. A full caching library may be overkill.

Adam Schiavone
  • 2,412
  • 3
  • 32
  • 65
  • `ConcurrentBag` uses thread local storage and has poor performance in your scenario. It's best used when each thread is both producing and consuming so to avoid performance hits when stealing items. – Zer0 May 02 '18 at 02:12
  • Nice catch. I didn't know about the thread local copy. In this case, it will still work, but performance will take a hit. However, I think the core principal here of storing a `Tuple` is a good one. You could even use a plain old Queue and check the date whenever you dequeue. If it's expired, simply continue. – Adam Schiavone May 02 '18 at 02:19
  • Thanks, reference found [here](https://learn.microsoft.com/en-us/dotnet/api/system.collections.concurrent.concurrentbag-1?view=netframework-4.7.1). Don't disagree with your overall simple design. – Zer0 May 02 '18 at 02:24
0

Instead of a list of strings, create a class that has a string property and a timestamp property. When you create an instance of the class, auto populate the timestamp property with DateTime.Now.

Each time you iterate the list to see if a string exists, check the timestamp property as well and discard any item older than 15 minutes.

example

class TimeStampedSearchResult
{
    public string SearchResult { get; set; }

    public DateTime TimeStamp { get; private set; }

    public TimeStampedSearchResult(string searchResult)
    {
        SearchResult = searchResult;
        TimeStamp = DateTime.Now;
    }

    public void UpdateTimeStamp()
    {
        TimeStamp = DateTime.Now;
    }
}

then you could use it like:

public SearchForever()
{
    //the results list
    List<TimeStampedSearchResult> results = new List<TimeStampedSearchResult>();
    //a list of expired results to remove from results list
    List<TimeStampedSearchResult> expiredResults = new List<TimeStampedSearchResult>();
    while (true)
    {
        //search for a result
        var searchResult = new TimeStampedSearchResult(SearchForStuff());
        bool found = false;
        //iterate our list
        foreach (var result in results)
        {
            if (result.SearchResult == searchResult.SearchResult)
            {
                result.UpdateTimeStamp();
                found = true;
            }
            else
            {
                if (result.TimeStamp < DateTime.Now.AddMinutes(-15))
                {
                    expiredResults.Add(result);
                }
            }
        }
        if (!found)
        {
            //add to our results list
            results.Add(searchResult);
            //write result to file
            WriteResult(searchResult.SearchResult, "myfile.txt")
        }

        //remove expired results
        foreach (var oldResult in expiredResults)
            results.Remove(oldResult);

        //make sure you clear the expired results list too.
        expiredResults.Clear();
    }
}
cjmurph
  • 839
  • 7
  • 11