2

I am processing zip files using queue. So whenever a new zip file arrives to the specified path, file_info will be added to queue using
fileQueue.Enqueue(IFile); Now I should add the File to queue only if it does not exist. I tried to implement IEqualityComparer<T> interface and the

public bool Equals(T x, T y)
{
   object xValue = _propertyInfo.GetValue(x, null);
   object yValue = _propertyInfo.GetValue(y, null);
   return xValue.Equals(yValue);
} 

and

public int GetHashCode(T obj)
{
   object propertyValue = _propertyInfo.GetValue(obj, null);
   if (propertyValue == null)
      return 0;
   else
      return propertyValue.GetHashCode();
} 

I have an interface to get the fileInfo

public interface IFile
{        
   string FilePath { get; set; }
}

Another queue object

public Queue<IFile> fileQueue = new Queue<IFile>();

Can anyone please suggest how to check if the file already exists in the queue before adding it to the queue again. Thanks you very much in advance.

croxy
  • 4,082
  • 9
  • 28
  • 46
Jyoti
  • 101
  • 2
  • 10

2 Answers2

3

If You want a fast performing solution (and want to avoid iterating over the whole queue each time You add smth) You'll need to implement Your own queue.

smth like this

public class MyQueue
{
    private Queue<IFile> _queue;
    private HashSet<int> hashes;

    public void Add(IFile file)
    {
        var hash = GetHash(file);
        if (hashes.Add(hash))
        {
            _queue.Enqueue(file);
        }
    }
}

Basically have a "table of contents" - that is implemented as a hashset (that is used to hold a list of unique values)

NOTE: once You start using queues, and messages - the whole idea behind such architecture - is that messages should be idempotent. Which means - it doesn't matter it You add the same message to the queue multiple times.

Community
  • 1
  • 1
Marty
  • 3,485
  • 8
  • 38
  • 69
  • Actually, almost every queue or ESB has mechnism called "Duplicate detection", which prevents system from one mesaage being consumed more than once. Not always necessary and turned off by default, but still pretty nice feature. So nothing wrong with design here. – Yura Jan 21 '16 at 10:29
  • There's no guarantee that two different files wouldn't have the same hash. It's highly improbable, but not impossible. This could result in a file being lost and nobody knowing why it happened. – Andrew Shepherd Jan 21 '16 at 10:32
  • Well - one could use a Guid instead of the hash. And the approach would still be the same. Or build a ID as a string - from FileName+Hash+Date or smth like that – Marty Jan 21 '16 at 10:35
  • Yeah, it all depends on system's unique requirements. But it is clear, that hash here is just to get an idea, not a concrete implementation. – Yura Jan 21 '16 at 10:38
1
if(!fileQueue.Any(x => x.FilePath == file.FilePath))
   fileQueue.Enqueue(file)
Magnus
  • 45,362
  • 8
  • 80
  • 118