4 stars based on
An important special kind of binary tree is the binary search tree BST. In a BST, each node stores some information including a unique key value and perhaps some associated data.
A binary tree is a BST iff, for every node nin the tree:. In these notes, we will assume that duplicates are not allowed.
Note that more than one BST can be used to store the same set of key values. For example, both of the following are BSTs that store the same set of integer keys:. The reason binary-search trees are important is that the following operations can be implemented efficiently using a BST:.
Which of the following binary trees are BSTs? If a tree is not a BST, say why. Using which kind of traversal pre-order, post-order, in-order, or level-order visits the nodes of a BST in sorted order?
To implement a binary search tree, we will use two classes: The following class definitions assume that the BST will store only key values, no associated data.
The type parameter K is the type of the key. To implement a BST that stores some data with each key, we would use the following class definitions changes are in red:.
From now on, we will assume that BSTs only store key values, not associated data. We will also assume that null is not a valid key value i. In general, to determine whether a given value is in the BST, we will start at the root of the tree and determine whether the value we are looking for:. If neither base case holds, a recursive lookup is done on the appropriate subtree. Since all values less than the root's value are in the left subtree and all values greater than the root's value are in the right subtree, there is no point in looking in both subtrees: The code for the lookup method uses an auxiliary, recursive method with the same name i.
How much time does it take to search for a value in a BST? Note that lookup always follows a path from the root down towards a leaf. In the worst case, it goes all the way to a leaf. Therefore, the worst-case time is proportional to the length of the longest path from the root to a leaf the height of the tree. In general, we'd like to know how much time is required for lookup as a function of the number of values stored in the tree.
In other words, what is the relationship between the number of nodes in a BST and the height of the tree? This depends on the "shape" of the tree. In the worst case, all nodes have just one child, and the tree is essentially a linked list. Searching for values in the range and will require following the path from the root down to the leaf the node containing the value 20i.
In the best case, all nodes have 2 children and all leaves are at the same depth, for example:. In general, a tree like this a full tree will have height approximately log 2 Nwhere N is the number of nodes in the tree. The value log 2 N is roughly the number of times you can divide N by two before you get to zero. The reason we use log 2.
However, when we use big-O notation, we just say that the height of a full tree with N nodes is O log N -- we drop the "2" subscript, because log 2 N is proportional to log k N for any constant k, i. In the worst case a "linear" tree this is O Nwhere N is the number of nodes in the tree. In the best case a "full" tree this is O log N.
Where should a new item go in a BST? The answer is easy: If you don't put it there then you won't find it later. Here are pictures illustrating what happens when we insert the value 15 into the example tree used above. It is easy to see that the complexity for insert is the same as for lookup: As mentioned above, the order in which values are inserted determines what BST is built inserting the same values in different orders can result in different final BSTs.
Draw the BST that results from inserting the values 1 to 7 in each of the following orders reading from left to right:. As you would expect, deleting an item involves a search to locate the node that contains the value to be deleted.
Here is an outline of the code for the delete method. If the search for the node containing the value to be deleted succeeds, there are three cases to deal with:. When the node to delete is a leaf, we want to remove it from the BST by setting the appropriate child pointer of its parent to null or by setting root to null if the node to be deleted is the root and it has no children.
Note that the call to delete was one of the following:. So in all three cases, the right thing happens if the delete method just returns null. When the node to delete has one child, we can simply replace that node with its child by returning a pointer to that child. As an example, let's delete 16 from the BST just formed:.
Here's the code for deletehandling the two cases we've discussed so far the new code is shown in red:. The hard case is when the node to delete has two children. We'll call the node to delete n. We can't replace node n with one of its children, because what would we do with the other child? Instead, we will replace the key in node n with the key value v from another node lower down in the tree, then recursively delete value v.
The question is what value can we use to replace n 's key? We have to choose that value so that the tree is still a BST, i. There are two possibilities that work: We'll arbitrarily decide to use the smallest value in the right subtree. To find that value, we just follow a path in the right subtree, always going to the left child, since smaller values are in left subtrees.
Once the value is found, we copy it into node nthen we recursively delete that value from n 's right subtree. Here's the final version of the delete method:. Below is a slightly different example BST; let's see what happens when we delete 13 from that tree. Write the auxiliary method smallest used by the delete method given above. The header for smallest is:. If the node to be deleted has zero or one child, then the delete method will "follow a path" from the root to that node.
So the worst-case time is proportional to the height of the tree just like for lookup and insert. So in the worst case, a path from the root to a leaf is followed twice. Since we don't care about constant factors, the time is still proportional to the height of the tree. The Java standard library has built into it an industrial-strength version of binary search trees, so if you are programming in Java and need this functionality, you would be better off using the library version rather than writing your own.
The class that most closely matches the outline above, in which the nodes contain only keys and no other data, is called TreeSet. Class TreeSet is an implementation of the Set interface. There is another implementation, called HashSetthat we will study later in this course. Here's an example of how you might use a Set to implement a simple spell-checker. This example used a set of String. You could also have a set of Integera set of Floator a set of any other type of object, so long as the type implements Comparable.
If you want to associate data with each key, use interface Map and the corresponding class TreeMap. For example, if you want to quickly look up an Employee given his employee number, you should use a Map rather than a Set to keep track of employees.
As another example, here is a complete program that counts the number of occurrences of each word in a document.
Without it, the program would look for "words" separated by spaces, considering "details" and "details. See the documentation for Scanner and Pattern for more details. The value type V can be any class or interface. The key type K can be any class or interface that implements Comparablefor example. The method put key, value returns the value previously associated with key if any or null if key is a new key. The method get key returns the value associated with key or null if there is no such value.
Both keys and values can be null. If your program stores null values in the map, you should use containsKey key to check whether a particular key is present. Map has many other useful methods. Of particular note are sizeremove keyclearand keySet.
The keySet method returns a Set containing all the keys currently in the map. The CountWords example uses it to list the words in the document. A binary search tree can be used to store any objects that implement the Comparable interface i. A BST can also be used to store Comparable objects plus some associated data. The advantage of using a binary search tree instead of, say, a linked list is that, if the tree is reasonably balanced shaped more like a "full" tree than like a "linear" treethe insertlookupand delete operations can all be implemented to run in O log N time, where N is the number of stored items.
For a linked list, although insert can be implemented to run in O 1 time, lookup and delete take O N time in the worst case. Logarithmic time is generally much faster than linear time. Of course, it is important to remember that for a "linear" tree one in which every node has one childthe worst-case times for insertlookupand delete will be O N.