Wednesday, July 07, 2004 3:43 PM
pgolde
Tree-based collections
So what collections should we be putting into Power Collections?
Looking at the feedback on Brad's original blog post, and by looking at other collections libraries in other languages, there are some obvious holes in System.Collections.Generics that should be filled.
Most glaring, I think, is the lack of any tree-based collections. Tree based collections differ from hash-based collections in the way that they compare items. Hash-based items use a "hash code" to efficiently compare items -- each item is converted into a integer which is then used to index items in the collections. If the hash codes are well distributed, then items can be quickly found in the collection by using the hash code.
Tree-based collections eschew hash codes, and instead use a comparison or ordering among items, so that any two items can be compared to see which is greater (or if they are equal). Given this comparison, a tree of elements in the collect is formed, which enables items to be found with a small number of comparisons (about the logarithm on N, where N is the number of items in the collection). The items in a tree-based collection can be efficiently traversed in a sorted order.
Special algorithms known as balanced binary trees ensure that trees are always created efficiently.
Typical tree-based collections would be:
a tree-based dictionary, that maps keys to values (tentative name: OrderedDictionary
a tree-based set that doesn't allow duplicates, that stored elements in a sorted order (tentative name: OrderedSet)
a tree-based set which allows duplicate elements (tentative name: OrderedBag)
These will be the first classes that I will be working on, and I hope to have some skeleton versions posted soon so that people can comment on and suggest improvements to the design. As I encounter particular design or implementation issues, I'll note them here so that people can comment.