1

I'm trying to add to a hash set from multiple threads. If the item already exists I want to update it, if it doesn't exist I want to add it to the list.

With the code I'm using I am ending up with lots of duplicates, I presume because multiple items are suddenly pointing to the same reference. I can't see where or why this is happening though.

Below is the code I'm using followed by the "Log" string ending when I first see the issue. You can see that suddenly all the items already added have the same value.

lock (_remoteDevicesLock)
{
    RemoteDevice rDevice = new RemoteDevice(notifyMessage.UUID, notifyMessage.Location);
    log += notifyMessage.UUID + " " + rDevice.UUID;
    if (!_remoteDevices.Add(rDevice))
    {
        log += " Not Added \r\n";
        rDevice = (from d in _remoteDevices
                   where d.UUID.Trim().Equals(notifyMessage.UUID.Trim(), StringComparison.OrdinalIgnoreCase)
                   select d).FirstOrDefault();
        if (rDevice != null)
        {
            //Update Device Expire Time
        }
    }                            
    else
    {
        log += " Added \r\n Current HashSet: \r\n";

        foreach (RemoteDevice rd in _remoteDevices)
        {
            log += rd.UUID + " \r\n";
        }
    }
}


00000000-0000-0001-1000-001cdf885737 00000000-0000-0001-1000-001cdf885737 Added 

Current HashSet: 
00000000-0000-0001-1000-001cdf885737 

00000000-0000-0001-1000-001cdf885737 00000000-0000-0001-1000-001cdf885737 Not Added 
00000000-0000-0001-1000-001cdf885737 00000000-0000-0001-1000-001cdf885737 Not Added 
00000000-0000-0001-0002-001cdf885737 00000000-0000-0001-0002-001cdf885737 Added 

Current HashSet: 
00000000-0000-0001-1000-001cdf885737 
00000000-0000-0001-0002-001cdf885737 

00000000-0000-0001-0002-001cdf885737 00000000-0000-0001-0002-001cdf885737 Not Added 
00000000-0000-0001-0002-001cdf885737 00000000-0000-0001-0002-001cdf885737 Not Added 
00000000-0000-0001-0002-001cdf885737 00000000-0000-0001-0002-001cdf885737 Not Added 
00000000-0000-0001-0002-001cdf885737 00000000-0000-0001-0002-001cdf885737 Not Added 
00000000-0000-0001-0001-001cdf885737 00000000-0000-0001-0001-001cdf885737 Added 

Current HashSet: 
00000000-0000-0001-1000-001cdf885737 
00000000-0000-0001-0002-001cdf885737 
00000000-0000-0001-0001-001cdf885737 

00000000-0000-0001-0001-001cdf885737 00000000-0000-0001-0001-001cdf885737 Not Added 
00000000-0000-0001-0001-001cdf885737 00000000-0000-0001-0001-001cdf885737 Not Added 
00000000-0000-0001-0000-001cdf885737 00000000-0000-0001-0000-001cdf885737 Added 

Current HashSet: 
00000000-0000-0001-1000-001cdf885737 
00000000-0000-0001-0002-001cdf885737 
00000000-0000-0001-0001-001cdf885737 
00000000-0000-0001-0000-001cdf885737 

00000000-0000-0001-0000-001cdf885737 00000000-0000-0001-0000-001cdf885737 Not Added 
00000000-0000-0001-0000-001cdf885737 00000000-0000-0001-0000-001cdf885737 Not Added 
00000000-0000-0001-0000-001cdf885737 00000000-0000-0001-0000-001cdf885737 Not Added 
00000000-0000-0001-0000-001cdf885737 00000000-0000-0001-0000-001cdf885737 Not Added 
00000000-0000-0001-1000-001cdf885737 00000000-0000-0001-1000-001cdf885737 Not Added 
00000000-0000-0001-1000-001cdf885737 00000000-0000-0001-1000-001cdf885737 Not Added 
00000000-0000-0001-1000-001cdf885737 00000000-0000-0001-1000-001cdf885737 Not Added 
00000000-0000-0001-0002-001cdf885737 00000000-0000-0001-0002-001cdf885737 Not Added 
00000000-0000-0001-0002-001cdf885737 00000000-0000-0001-0002-001cdf885737 Not Added 
00000000-0000-0001-0002-001cdf885737 00000000-0000-0001-0002-001cdf885737 Not Added 
00000000-0000-0001-0002-001cdf885737 00000000-0000-0001-0002-001cdf885737 Not Added 
00000000-0000-0001-0002-001cdf885737 00000000-0000-0001-0002-001cdf885737 Not Added 
00000000-0000-0001-0001-001cdf885737 00000000-0000-0001-0001-001cdf885737 Not Added 
00000000-0000-0001-0001-001cdf885737 00000000-0000-0001-0001-001cdf885737 Not Added 
00000000-0000-0001-0001-001cdf885737 00000000-0000-0001-0001-001cdf885737 Not Added 
00000000-0000-0001-0000-001cdf885737 00000000-0000-0001-0000-001cdf885737 Not Added 
00000000-0000-0001-0000-001cdf885737 00000000-0000-0001-0000-001cdf885737 Not Added 
00000000-0000-0001-0000-001cdf885737 00000000-0000-0001-0000-001cdf885737 Not Added 
00000000-0000-0001-0000-001cdf885737 00000000-0000-0001-0000-001cdf885737 Not Added 
00000000-0000-0001-0000-001cdf885737 00000000-0000-0001-0000-001cdf885737 Not Added 
00000000-0000-0001-1000-001cdf885737 00000000-0000-0001-1000-001cdf885737 Not Added 
00000000-0000-0001-1000-001cdf885737 00000000-0000-0001-1000-001cdf885737 Not Added 
00000000-0000-0001-1000-001cdf885737 00000000-0000-0001-1000-001cdf885737 Not Added 
00000000-0000-0001-0002-001cdf885737 00000000-0000-0001-0002-001cdf885737 Not Added 
00000000-0000-0001-0002-001cdf885737 00000000-0000-0001-0002-001cdf885737 Not Added 
00000000-0000-0001-0002-001cdf885737 00000000-0000-0001-0002-001cdf885737 Not Added 
00000000-0000-0001-0002-001cdf885737 00000000-0000-0001-0002-001cdf885737 Not Added 
00000000-0000-0001-0002-001cdf885737 00000000-0000-0001-0002-001cdf885737 Not Added 
00000000-0000-0001-0001-001cdf885737 00000000-0000-0001-0001-001cdf885737 Not Added 
00000000-0000-0001-0001-001cdf885737 00000000-0000-0001-0001-001cdf885737 Not Added 
00000000-0000-0001-0001-001cdf885737 00000000-0000-0001-0001-001cdf885737 Not Added 
00000000-0000-0001-0000-001cdf885737 00000000-0000-0001-0000-001cdf885737 Not Added 
00000000-0000-0001-0000-001cdf885737 00000000-0000-0001-0000-001cdf885737 Not Added 
00000000-0000-0001-0000-001cdf885737 00000000-0000-0001-0000-001cdf885737 Not Added 
00000000-0000-0001-0000-001cdf885737 00000000-0000-0001-0000-001cdf885737 Not Added 
00000000-0000-0001-0000-001cdf885737 00000000-0000-0001-0000-001cdf885737 Not Added 
00000000-0000-0001-1000-001cdf885737 00000000-0000-0001-1000-001cdf885737 Added 

Current HashSet: 
00000000-0000-0001-0000-001cdf885737 
00000000-0000-0001-0000-001cdf885737 
00000000-0000-0001-0000-001cdf885737 
00000000-0000-0001-0000-001cdf885737 
00000000-0000-0001-1000-001cdf885737 

Update: Here is GetHashCode And Equals As Requested although I don't think the issue lies here as I was using a list with a manual check and also had issues.

public override bool Equals(object obj)
{
    var other = obj as RemoteDevice;
    if (other == null)
    {
        return false;
    }
    else
    {
        return UUID.Trim().Equals(other.UUID.Trim(), StringComparison.OrdinalIgnoreCase);
    }
}

public override int GetHashCode()
{
    return UUID.GetHashCode();
}
Oli
  • 2,996
  • 3
  • 28
  • 50
  • 1. Don't lock on objects exposed outside of your class; you should probably be creating a specific object just to lock on. 2. Don't use string concatenation to add to a log string, use a `StringBuilder`. – Servy Nov 21 '12 at 18:52
  • What do RemoteDevice.Equals and RemoteDevice.GetHashCode look like? – fsimonazzi Nov 21 '12 at 18:53
  • You can examine this thread http://stackoverflow.com/questions/4306936/how-to-implement-concurrenthashset-in-net – Hamlet Hakobyan Nov 21 '12 at 18:54
  • @Servy How would I protect the hashset from addition and removal. The hashset is read outside the class but not modified. 2) The "Log" is only there for this question. – Oli Nov 21 '12 at 19:01
  • @Oli Using `lock` on it won't prevent problem in that case, even with it as it is. It seems you mis-understand what lock, fundamentally, does. Having said that, if it only reads and doesn't write it's not the cause of your problem, but you should still fix it. – Servy Nov 21 '12 at 19:03
  • @Servy As it stands nothing is currently reading or writing to the hashset. The only access is via that code snippet in the question.Please however explain what needs fixing as I don't understand.I thought lock meant only one thread could access a code snippet at once and since that code snippet is the only thing modifying the Hashset it should be threadsafe and without the lock it wouldn't be? – Oli Nov 21 '12 at 19:06
  • 1
    Are you updating the UUID of the entries in the hash set after you've retrieved them? If not, how do you explain the change to the ids of existing entries in the log? – fsimonazzi Nov 21 '12 at 19:07
  • You should `Trim` the UUID when you are adding it to your object. Better still (if you can make the change), switch your UUID to actually be a `System.Guid` as opposed to a string containing a GUID. – pstrjds Nov 21 '12 at 19:11
  • @fsimonazzi That's the point exactly. The code you see is the only place the HashSet is used and I simply don't understand how or why those values would be changing. I can only presume the references are changing to all point at the same object but can't see why. – Oli Nov 21 '12 at 19:11
  • @pstrjds Unfortunately the device architecture I'm following says that I have to be able to accept any string as UUID for backwards compatibility. – Oli Nov 21 '12 at 19:12
  • @Oli can you debug this code? If so, you can tell whether the instances there are the same or not. And you can also set breakpoints wherever the UUID value is set. We only have a partial picture of what could be going on here... – fsimonazzi Nov 21 '12 at 19:17
  • What happens when you avoid `Trim` altogether? @Servy's implementation of equality & GetHashCode as well as `where d equals notifyMessage` – Austin Salonen Nov 21 '12 at 19:28

3 Answers3

2

The following two UUIDs will have different hash codes but compare as equal: "x", " x ". Reason: You are treating whitespace differently.

You need to make GetHashCode and Equals constent. If Equals returns true, the two hash codes must be identical. If you fail to adhere to this contract HashSet behaves in an undefined way (duplicates possible).

Solution: Either Trim in both places, or in none.

usr
  • 168,620
  • 35
  • 240
  • 369
1

By design, it's important that the hash code of an item in the set never chance while it's inside of the set. The set has no way of detecting that the internal state of the object changed, so it will be in the "bucket" for it's old hash value, so when you try to add another item with the new hash code it will see that it's "bucket" is empty and add the item. If you want to "change" an item while it's currently in the set you should remove it, change it, and then add it back in. Or, better yet (from a design perspective) remove the old value, and add a new object entirely (possibly with certain aspect copied from the removed one).

It appears that was your issue; I'll leave the rest of my suggestions below despite them not being the issue you encountered.

Your RemoteDevice class possibly doesn't override Equals and GetHashCode with meaningful implementations. The default implementations (defined in object are based just on the address in memory of the object, so two different instances of with all of the same values will be "not equal" by that definition. Since it seems you have a single GUID valid as a unique ID (GUID has sensible Equals and GetHashCode definitions) your implementation should just defer to that.

i.e.:

public class RemoteDevice
{
    public Guid UUID { get; set; }

    public override bool Equals(object obj)
    {
        RemoteDevice other = obj as RemoteDevice;
        if (other == null) return false;
        return UUID.Equals(other.UUID);
    }

    public override int GetHashCode()
    {
        return UUID.GetHashCode();
    }
}

It also appears that you misunderstand how lock works. Using lock(myObject) doesn't prevent any other object from ever using myObject. All it does is cause anyone else trying to lock on that same instance it wait until you exit your lock before they can enter theirs. This means that your code needs to lock on the same instance of an object before anyone accesses the HashSet (since HashSet wasn't designed to be used by multiple threads).

If doing that isn't an option, or would be undesirable, you'd need to look at making a collection that can be accessed from multiple threads. Many collections have an implementation in System.Collections.Concurrent, but unfortunately there is no ConcurrentSet. There are several options; we could make our own for one, but another option would be to use the ConcurrentDictionary and simply ignore the values and just use the keys. It would be a tad messy, but creating your own concurrent collection effectively is...hard. Personally, if I wanted to use one I'd just create a wrapper around ConcurrentDictionary that hide the fact that it stores pairs.

Servy
  • 202,030
  • 26
  • 332
  • 449
  • I've updated the question and the overrides look almost identical. – Oli Nov 21 '12 at 19:02
  • I'm still a bit confused about the issue with the lock. I'm locking _remoteDevicesLock not _remoteDevices | I'm locking that section of code so threads wait and execute one at a time thus making the hashset thread safe. If I'm not doing that then please explain why and what I'm doing wrong. – Oli Nov 21 '12 at 19:25
  • 1
    Servy-"HashCode of an item in the set never chance while it's inside of the set" is the issue. Turns out (I won't name the brand)but my wireless router is for some reason changing it's UUID against the spec of Upnp. I guess I'll just have to work out how to code around this. – Oli Nov 21 '12 at 19:35
  • @Oli Good luck with that. I'll move that section to the top to emphasize it. – Servy Nov 21 '12 at 19:37
0

Use a Dictionary, will make your code much simpler and robust and will not depend on GetHashCode overrides. Lookups will be much faster too, the LINQ query isn't really good performance wise. Should pay attention to simple things like not calculating same value more than once, especially if it's in a loop. I.e. notifyMessage.UUID.Trim() is invoked as many times in the LINQ query as many devices are in list. Id should be calculated once before loop and reused.

Here's a sample using Dictionary:

var _remoteDevices = new Dictionary<string, RemoteDevice>();

...

var deviceId = notifyMessage.UUID.Trim().ToLowerInvariant();

RemoteDevice remoteDevice;

if (_remoteDevices.TryGetValue(deviceId, out remoteDevice))
{
    UpdateDevice(remoteDevice);
}
else
{
    var newDevice = CreateDevice(notifyMessage);

    _remoteDevices.Add(deviceId, newDevice);
}

The above code does a single lookup instead of two in the code in your question, where both _remoteDevices.Add and the LINQ query perform a lookup. The LINQ query does actually a full iteration because using Where instead of FirstOrDefault with a predicate unless the compiler is clever enough to transform the expression to FirstOrDefault(d => d.UUID.Trim().Equals(notifyMessage.UUID.Trim(), StringComparison.OrdinalIgnoreCase).

user3285954
  • 4,499
  • 2
  • 27
  • 19