11

I have a list of ~9000 products, and some of which may have duplicates.

I wanted to make a HashTable of these products with the products serial number as their key so I can find duplicates easily.

How would one go about using a HashTable in C#/.NET? Would a HashSet be more appropriate?

Eventually I would like a list like:

Key-Serial: 11110 - Contains: Product1
Key-Serial: 11111 - Contains: Product3, Product6, Product7
Key-Serial: 11112 - Contains: Product4
Key-Serial: 11113 - Contains: Product8, Product9

So, I have a list of all products, and they are grouped by the ones that have duplicate serial numbers. What is the "correct" way to do this?

Biro
  • 111
  • 1
  • 1
  • 3

6 Answers6

14

I think Dictionary is the recommended class for stuff like this.

it would be something like this in your case

Dictionary<string, List<Product>>

(using serial string as key)

peter p
  • 778
  • 5
  • 14
  • That is a kludge, how could you choose the right product from the list? There's no substitute for a unique key. – Aviad P. Jan 03 '10 at 19:10
  • 8
    Why is this a kludge? The question was about grouping products by serial. This is a straightforward, simple and readable answer which meets the requirements, no? – peter p Jan 03 '10 at 19:29
8

A hashtable is a kind of dictionary, and a hashset is a kind of set. Neither dictionaries nor sets directly solve your problem - you need a data structure which holds multiple objects for one key.

Such databases are often called multimaps. You can create one by simply using a hashtable where the type of keys are integers and the types of values are sets of some kind (for example, hashsets...).

Alternatively, you can look at existing multimap solutions, such as here: multimap in .NET.

For information on using hashtables, you can check it out on MSDN: http://msdn.microsoft.com/en-us/library/system.collections.hashtable.aspx, and there are plenty of other tutorials - search on using either "HashTable" or "Dictionary".

Community
  • 1
  • 1
Oak
  • 26,231
  • 8
  • 93
  • 152
6

A generic Dictionary would suite this best, I think. Code might look something like this:

var keyedProducts = new Dictionary<int,List<string>>();

foreach (var keyProductPair in keyProductPairs)
{
  if (keyedProducts.Contains(keyProductPair.Key))
    keyedProducts[keyProductPair.Key].Add(keyProductPair.Product);
  else
    keyedProducts.Add(keyProductPair.Key, new List<string>(new[]{keyProductPair.Product}));
}
James Kolpack
  • 9,331
  • 2
  • 44
  • 59
3

A great option now available in .NET is the Lookup class. From the MSDN documentation:

A Lookup(Of TKey, TElement) resembles a Dictionary(Of TKey, TValue). The difference is that a Dictionary(Of TKey, TValue) maps keys to single values, whereas a Lookup(Of TKey, TElement) maps keys to collections of values.

There are some differences between a Lookup and Dictionary(Of List). Namely, the Lookup is immutable (can't add or remove elements or keys after it's created). Depending on how you plan to use your data, the Lookup may be advantageous compared to GroupBy().

Community
  • 1
  • 1
Zairja
  • 1,441
  • 12
  • 31
1

First you need to define your 'Primary Key' as it were, a set of fields that are unique to each object. I guess Key-Serial would be part of that set, but there must be others. Once you define that 'Primary Key' you can define a struct that represents a Key Value and use that as the key to a dictionary containing your products.

Example:

struct ProductPrimaryKey
{
    public string KeySerial;
    public string OtherDiscriminator;

    public ProductPrimaryKey(string keySerial, string otherDiscriminator)
    {
        KeySerial = keySerial;
        OtherDiscriminator = otherDiscriminator;
    }
}

class Product
{
    public string KeySerial { get; set; }
    public string OtherDiscriminator { get; set; }
    public int MoreData { get; set; }
}

class DataLayer
{
    public Dictionary<ProductPrimaryKey, Product> DataSet 
        = new Dictionary<ProductPrimaryKey, Product>();

    public Product GetProduct(string keySerial, string otherDiscriminator)
    {
        return DataSet[new ProductPrimaryKey(keySerial, otherDiscriminator)];
    }
}
Aviad P.
  • 32,036
  • 14
  • 103
  • 124
0

If you wanted to simply have a list of duplicates, you could:

  • take create a Dictionary<T> of your table entries (let's call it IEnumerable<T> (which ignores duplicate keys)

  • create a Hashset<T> of the same IEnumerable<T> (which keeps duplicate keys, as long as the entire row isn't the same)

  • and then iterate through dictionary.Values, calling hashset.Remove(value) for each value

What's left in the hashset is the duplicates.

Christopher Stevenson
  • 2,843
  • 20
  • 25