C# HashSet: Everything you need to know

Collection data structures have a major impact on the application performance.

A HashSet is a data structure that is used for high-performance operations. HasSet contains unique elements, so the implementation makes it possible to search through the data faster.

The most important features of a HashSet are:

A HashSet collection is not sorted and cannot have duplicate elements.
A HashSet has superior performance when compared to other collections because of its intrinsic implementation.
A HashSet collection is not thread safe, so it should implement a Thread Safe code by using the synchronization techniques.

Key points

When working with HashSet, note that HashSet:

Stores the values unordered.
Can add or remove elements, but it performs the best with a fixed length.
Allows the null values.
Is serializable because it implements ISerializable interface.
Doesn't use indices, so the ElementAt method called on a HashSet object with an integer as a parameter will return the element at the specified position.

HashSet usage

To use a HashSet, you must first have an instance of a HashSet.

var set = new HashSet<int>();

After you have created an instance of a HashSet , add items to it by adding them one at a time like this:

set.Add(3);

You can also add several items to the C# HashSet at once using this syntax:

var set = new HashSet<int> { 1, 2, 3 };

To remove specific items from a set, you would use the "Remove" method like this:

set.Remove(3);

Make sure to add using System.Collections.Generic at the top of your program.

HashSet performance

Data Structure	Insert	Remove	Search	Space Used
Array	N/A	N/A	O(n)	O(n)
List	O(n)	O(n)	O(n)	O(n)
Linked List	O(1)	O(1)	O(n)	O(n)
Hash Set	O(1)	O(1)	O(1)	O(n)

The performance of a HashSet is based on a hash function implementation. The time required to access the element in a HashSet is constant, independent of the number of items added to it.

The implementation uses an array for the implementation and integer as a key. The performance of HashSet depends on the load factor that is used by this collection of data structure.

When we don't add or remove elements frequently, a HashSet outperforms other collections.

HashSet methods

Important HashSet methods are:

Add: Adds the specified element to a HashSet if it is not already present.
Clear: Removes all elements from a HashSet.
Contains: Determines whether a HashSet contains an element with the specified value.

Add

The Add() method adds an item to the HashSet but only if the item is not already present in the HashSet.

If the item is already present in the HashSet, this method does nothing and returns false.

Clear

The Clear() method removes all items from the set. It clears the elements, but not the underlying buckets and slots array. To clear the underlying slots and release the memory, we need to call the TrimExcess method immediately after.

Contains

The Contains() method checks whether a specified element is present in the HashSet. It returns true if an element is found in the set, otherwise false.

HashSet enumeration

In C#, we can enumerate HashSet. HashSet implements the IEnumerable interface. That allows us to use the foreach statement to iterate over a HashSet.

We can also use LINQ to look at the values of a HashSet, or to filter it.

Thread safety

HashSet is not thread safe, so it should implement a Thread Safe code by using the synchronization techniques.

For example, you can use the lock statement to implement thread safety.

You can also use any .NET synchronization primitive for this purpose, such as SemaphoreSlim or ReaderWriterLockSlim.

The following example shows how to implement thread-safe code by using lock keyword:

 private readonly ReaderWriterLockSlim _lock = new ReaderWriterLockSlim(LockRecursionPolicy.SupportsRecursion);
        private readonly HashSet<T> _hashSet = new HashSet<T>();

        #region Implementation of ICollection<T> ...ish
        public bool Add(T item)
        {
            try
            {
                _lock.EnterWriteLock();
                return _hashSet.Add(item);
            }
            finally
            {
                if (_lock.IsWriteLockHeld) _lock.ExitWriteLock();
            }
        }

Full code here

HashSet vs List

The main difference between HashSet and List is that the list can contain duplicate elements, though both are used to store unique values.

HashSet is much faster than List collection because HashSet lazily initializes underlying data structures.

Hash tables usually contain only a few elements, so the initial capacity is set small — 3 elements.

At the time of adding to HashSet collection, the underlying code compares hash codes, which allows it to group them in buckets. So when you get an item, the underlying code checks it in its respective bucket.

HashSet vs Hashtable

The main difference between HashSet and Hashtable is that the HashSet works with objects while Hashtable has key-value pairs.

The underlying implementation is similar for both because it relies on hash codes. For example, when you add an element to the Hashtable, the implementation adds it into a bucket based on the hash code of the key. If you want to find that element again, the Hashtable will use the hash code of the key to look in only one particular bucket, which makes it easier to find.

HashSet and object equality

For primitive data types, the Object.Equals method returns true if two objects are equal. For example, consider these two int variables:

int num1 = 100;

int num2 = 120;

It's easy to tell that 100 is not equal 120.

But for complex objects that we might have in a collection, the Object.Equals method can return false even when two objects are equal in their fields.

For example, consider these two Car objects:

Car car1 = new Car("Honda");

Car car2 = new Car("Honda");

By default, using Object.Equals returns false because the two Car object's reference variables are different.

To tell the compiler that these objects are equal, we need to implement the IEquatable interface. This interface lets us override the default Object.Equals method to use our own implementation.

For example:

public class Car: IEquatable<Car>
{
    public string Name { get; set; }
    public Car(string name)
    {
        Name = name;
    }

    public override int GetHashCode()
    {
        return Name.GetHashCode();
    }

    public bool Equals(Car other)
    {
        return this.Name.Equals(other.Name);
    }
}

The code above overrides the default Object.Equals method and uses a string's hash code as a way to compare equality between two Car objects.

So now that we have the IEquatable interface implemented, using our new class in a HashSet collection will work.

What is a Hash?

In C#, hash is an integer value that identifies a particular value.

Hash codes are a way to compare two objects to see if they are equal:

Two equal objects return hash codes that are equal.
Unequal objects can have identical hash codes.

That's why we use a hash code as a value in a HashSet collection. Comparing the integer value is much faster than comparing two objects using the equals method.

To get the hash code integer value, we can Object.GetHashCode method. Every object has this method.

Conclusion

HashSet is a. NET collection that stores unique elements. Unlike the List, HashSet does not store duplicate elements.

When you add an element to HashSet, it compares hash codes, which allows it group similar items together to save time when looking for an element.

HashSet is faster than List collection because it uses hashing for storage.

HashSet is a superb choice for storing unique elements because it can quickly compare hash codes to group similar items together.

If you are looking for a more efficient way to store and access data in C#, then HashSet should be at the top of your list.

Be careful with the performance of your applications. Always measure and test to ensure that you keep improving it.

Published on: Feb 6, 2022