C#: How to get unique values in a List?
C#: How to get unique values in a List?
To get a unique value in a List:
- Import the
System.Linq
namespace - Use the
Distinct
method on your list. - (optional) Get a list using the
ToList
method.
Full code example:
List<int> numbers = new() { 1, 2, 2, 2, 5, 6};
List<int> unique = numbers.Distinct().ToList();
Console.WriteLine(string.Join(",", unique)); // 1,2,5,6
Distinct
is an extension method from the System.Linq
namespace that compares hash values to get unique values from an IEnumerable
collection.
Distinct
does not directly iterate over the collection to get unique values. Instead, it uses the yield keyword to defer the execution.
What about objects?
By default, Distinct
is not going to work with objects because even when objects contain the same values, they have a different reference in memory.
Here's an example:
List<User> userList = new() { new User(1, "Joe"),new User(2, "Matt"), new User(1, "Joe") };
userList.Distinct().ToList().ForEach(Console.WriteLine);
// Prints:
// 1: Joe
// 2: Matt
// 1: Joe
userList.GroupBy(u => u.Id).Select(g => g.First()).ToList().ForEach(Console.WriteLine);
public class User {
public int Id { get; init; }
public string Name { get; init; }
public User (int id, string name)
{
Id = id;
Name = name;
}
public override string ToString() => $"{Id}: {Name}";
}
We can see that Distinct
didn't filter out two instances of the User(1, "Joe)
object.
The easiest way to filter out duplicates when you have a unique key is to use DistinctBy
:
userList.DistinctBy(user => user.Id).ToList().ForEach(Console.WriteLine);
// Prints:
// 1: Joe
// 2: Matt
The code above bases the distinction on the Id
property which gives us the expected result.
Besides Distinct
, there are 3 major alternatives to get unique values from a List:
1. Use a HashSet
Hashset is a data structure that contains only unique elements.
We can use the HashSet(IEnumerable<T> collection)
constructor to get unique values from our list because List<T>
implements the IEnumerable<T>
interface. We can turn the newly created HashSet to a List to get all the unique values.
For example:
List<int> numbers = new() { 1, 2, 2, 2, 5, 6};
List<int> unique = new HashSet<int>(numbers).ToList();
Console.WriteLine(string.Join(",", unique)); // 1,2,5,6
This works because the Hashset constructor takes in an IEnumerable collection and an IEqualityComparer as parameters. If the collection passed in is already a HashSet of the same type and the equality comparers are equal, it copies the elements from that HashSet. If not, it uses the collection's count to set the initial capacity and then adds the elements from the collection to the HashSet.
2. Use GroupBy
You can also use GroupBy
to get unique values from a collection:
List<int> numbers = new() { 1, 2, 2, 2, 5, 6};
List<int> unique = numbers.GroupBy(x => x).Select(x => x.Key).ToList();
Console.WriteLine(string.Join(",", unique)); // 1,2,5,6
This approach is not very effective when it comes to primitives, but it was very popular way of getting unique values from a list before the introduction of GroupBy
. For example:
List<User> userList = new() { new User(1, "Joe"),new User(2, "Matt"), new User(1, "Joe") };
userList.GroupBy(u => u.Id).Select(g => g.First()).ToList().ForEach(Console.WriteLine);
// Prints:
/// 1: Joe
// 2: Matt
3. Use a for loop
Using a for loop is my least favorite way of getting unique values from a list because it doesn't include many of the performance factors that the .NET code does. e.g. Distinct, HashSet
However, it can be useful if you were going to iterate your list anyway for something else.
var result = new List<int>();
foreach (var value in _data)
{
// do something else with my `value`
// also create a unique list
if (!result.Contains(value))
result.Add(value);
}
What's the fastest way to get unique values in a List?
I've used BenchmarkDotNet to test out the performance of all these methods:
Method | Mean | Error | StdDev |
---|---|---|---|
HashSet | 16.24 us | 0.597 us | 1.645 us |
Distinct | 17.90 us | 0.357 us | 0.978 us |
LinqGroupBy | 66.41 us | 1.321 us | 3.768 us |
ForLoop | 179.73 us | 3.557 us | 7.883 us |
The Distinct method and using a HashSet are the fastest ways to get unique values. Using a ForLoop is over 10 times slower, so I'd definitely avoid that one.
Conclusion: Even though HashSet had a slightly better performance than Distinct, Distinct is my number one choice for getting unique values from a list. That's because it clearly communicates the intent.
Full test code:
using BenchmarkDotNet.Attributes;
public class UniqueValuesBenchmark
{
private readonly List<int> _data;
public UniqueValuesBenchmark()
{
_data = Enumerable.Range(0, 1000).ToList();
}
[Benchmark]
public List<int> HashSet() => new HashSet<int>(_data).ToList();
[Benchmark]
public List<int> Distinct() => _data.Distinct().ToList();
[Benchmark]
public List<int> LinqGroupBy() => _data.GroupBy(x => x).Select(x => x.Key).ToList();
[Benchmark]
public List<int> ForLoop()
{
var result = new List<int>();
foreach (var value in _data)
{
if (!result.Contains(value))
result.Add(value);
}
return result;
}
}
Josip Miskovic is a software developer at Americaneagle.com. Josip has 10+ years in experience in developing web applications, mobile apps, and games.
Read more posts →I've used these principles to increase my earnings by 63% in two years. So can you.
Dive into my 7 actionable steps to elevate your career.