Friday, January 25, 2008 2:53 AM
r2 musings - rants, raves, and research (mostly on .NET topics) from rik robinson
The New Generic Kid on the Block - HashSet<T>
In mathematics, a Set is typically thought of as a collection of distinct objects
that is usually defined by some rule that determines whether they are a member of
that particular Set. For example, a Set could be defined to contain "all the
odd numbers under 100" or "every number divisible by 2" or whatever. The main
points here being that the objects in a Set are distinct (i.e. NO duplicate objects
are allowed) and the objects are not ordered or sorted in any way. (If you really
feel the need to geek out on Set Theory, be my guest).
According to the MSDN
Documentation for HashSet, "a set is a collection that contains no duplicate elements,
and whose elements are in no particular order." Sound familiar? In the
.NET Base Class Library, there has never been a true Set class until now with the
release of .NET 3.5 and the brand-spanking-new System.Collections.Generic.HashSet<T>
class.
In the past, when we needed to implement a Set in .NET, we sort of bastardized the
List<T> or other unsuspecting Collection classes into something like this:
int[] intArray = new int[]{ 1, 2, 5, 7, 4, 1, 7, 9, 8};
List<int> intList = new List<int>();
foreach(int theInt in intArray)
{
// checking to make sure that the object is not in our List before adding
// so we are calling Contains() and then Add() on every time through
the loop
if (!intList.Contains(theInt))
intList.Add(theInt);
}
What we needed was a Set that would let us just keep adding objects to it and free
us from having to worry about it being there first. A recent example of when
I really needed a true Set class was while working on the Weather
Widget. The Weather Channel provides
some 40+ images for use in their SDK, yet in the Weather Widget I never need more
than maximum of 11 (and usually less as there is usually duplication). So, I
had the idea to dynamically zip the images that I actually needed for a given Zip
Code and use that file for my image assets in Silverlight via CreateXamlFromDownloader.
Obviously, I don't want these images duplicated in the zip file. So, I came
up with this:
HashSet<string> assets = new HashSet<string>();
foreach (ForecastDay day in forecast.Days)
{
// no Exception here if duplicate...it just doesn't add it.
assets.Add(Server.MapPath(String.Format("~/images/{0}", day.IconDay)));
}
Utility.WriteZipFile(assets.ToArray<string>())
This just scratches the surface of what HashSet<T> is capable of. There
are member methods available for most Set operations, such as IntersectWith(), UnionWith(), IsSubsetOf(), IsSupersetOf(),
RemoveWhere(), etc. Here's a link to all of the HashSet<T>
Members.
Recall that being a member of a set depends on some rule that determines
this membership. With the HashSet<T>, you can define your own rule of
what it means to be in a Set. For a good example of defining your own EqualityComparer,
see Introducing
HashSet<T> from Kim Hamilton on the BCL
Team Blog.
So, welcome the New Kid on the Block to the Generic Collections.
One last random link I had on the subject for those interested in LINQ:
Good
side-by-side comparison of the HashSet and LINQ Set Operations