Friday, August 27, 2004 3:53 PM
pgolde
Set operations
There's a design issue on OrderedBag and OrderedSet that I've been wrestling with in my mind for a while, and I'm not entirely sure what the right approach is. Perhaps by writing it out and getting feedback the best course of action will become a bit more clear.
The OrderedSet and OrderedBag collections provide a collection of methods that I collective call the “set operations” -- methods that correspond to the basic operations of classic set theory: subset, superset, union, intersection, and difference.
To take one of the simplest examples, there is a method on OrderedSet:
bool SupersetOf(OrderedSet<ItemType> otherSet)
that determines if “this” is a superset of otherSet.
So far, so good, this seems very straightforward. But there is a subtlety: even with the same ItemType, different sets can use different comparisons. For example, an OrderedSet<string> can be instantiated by passing StringComparer.CurrentCultureIgnoreCase, which causes that set to consider two strings equal even if they have different cases (I'll call this a case-insensitive set).
What happens when you apply a set operation to two sets, one of which is case-sensitive, and the other of which is case-insensitive? Suddenly, the apparent clear-cut semantics of the set operations no longer have an obvious answer.
Suppose set1 is a case-sensitive set with elements {”foo”, “bar”} and set2 is a case-insensitive set with element {”FOO”}. Is set1 a superset of set2? From set1's case-sensitive perspective, the answer is no. From set2's case-insensitive perspective, the answer is yes. What should we do?
There seems to be a few options:
1) Before performing any set operation on two sets, check to make sure that they use the same comparison operation, and throw an exception if not. This is what is currently implemented in the code.
This is the safest and simplest solution, but I fear that it is too restrictive. The problem lies in whether the code can accurately determine if the two sets are in fact using the comparison operation. Currently, I've coded this to test the two IComparer<ItemType> instances using object.Equals.
Unfortunately, there are common cases where two comparers are the same, but don't compare equal. For example, suppose you create two sets like this:
OrderedSet<string> set1 = new OrderedSet<string>(StringComparer.CurrentCultureIgnoreCase);
OrderedSet<string> set2 = new OrderedSet<string>(StringComparer.CurrentCultureIgnoreCase);
Unfortunately, there isn't any way to detect that these two collections are using the same comparer: every call to StringComparer.CurrentCultureIgnoreCase creates a new comparer object, and they don't compare equal [see MSDN suggestion FDBK14108.]
2) Another option is to always use the comparison method of the first set in the operation (”this”), and for the code not to make any assumptions about whether the comparison mechanisms are the same between the two sets. The problem is that this can lead to significantly less efficient algorithms than are possible if the two sets have the comparison method. This seems rather unfortunate.
3) A third option is for the code to assume that both sets use the same comparison mechanism, but to not attempt to check for it and enforce it. This neatly sidesteps the problem when the same comparison is actually being used, but we can't determine that from the code. However, it can also lead to nasty, confusing bugs if two different comparisons are in fact being used.
Interestingly, this appears to be the solution that STL uses.
What do you think?