SKEDSOFT

Design Analysis Of Algorithm

Figure 7.4: A recursion tree for QUICKSORT in which PARTITION always produces a 9-to-1 split, yielding a running time of O(n lg n). Nodes show subproblem sizes, with per-level costs on the right. The per-level costs include the constant c implicit in the Θ(n) term.

Intuition for the average case

To develop a clear notion of the average case for quicksort, we must make an assumption about how frequently we expect to encounter the various inputs. The behavior of quicksort is determined by the relative ordering of the values in the array elements given as the input, and not by the particular values in the array.

When we run quicksort on a random input array, it is unlikely that the partitioning always happens in the same way at every level, as our informal analysis has assumed. We expect that some of the splits will be reasonably well balanced and that some will be fairly unbalanced.

In the average case, PARTITION produces a mix of "good" and "bad" splits. In a recursion tree for an average-case execution of PARTITION, the good and bad splits are distributed randomly throughout the tree. Suppose for the sake of intuition, however, that the good and bad splits alternate levels in the tree, and that the good splits are best-case splits and the bad splits are worst-case splits. Figure 7.5(a) shows the splits at two consecutive levels in the recursion tree. At the root of the tree, the cost is n for partitioning, and the subarrays produced have sizes n - 1 and 0: the worst case. At the next level, the subarray of size n - 1 is best-case partitioned into subarrays of size (n - 1)/2 - 1 and (n - 1)/2. Let's assume that the boundary-condition cost is 1 for the subarray of size 0.

Figure 7.5: (a) Two levels of a recursion tree for quicksort. The partitioning at the root costs n and produces a "bad" split: two subarrays of sizes 0 and n - 1. The partitioning of the subarray of size n - 1 costs n - 1 and produces a "good" split: subarrays of size (n - 1)/2 - 1 and (n - 1)/2. (b) A single level of a recursion tree that is very well balanced. In both parts, the partitioning cost for the subproblems shown with elliptical shading is Θ(n). Yet the subproblems remaining to be solved in (a), shown with square shading, are no larger than the corresponding subproblems remaining to be solved in (b).

The combination of the bad split followed by the good split produces three subarrays of sizes 0, (n - 1)/2 - 1, and (n - 1)/2 at a combined partitioning cost of Θ(n) Θ(n - 1) = Θ(n). Certainly, this situation is no worse than that in Figure 7.5(b), namely a single level of partitioning that produces two subarrays of size (n - 1)/2, at a cost of Θ(n). Yet this latter situation is balanced! Intuitively, the Θ(n - 1) cost of the bad split can be absorbed into the Θ(n) cost of the good split, and the resulting split is good. Thus, the running time of quicksort, when levels alternate between good and bad splits, is like the running time for good splits alone: still O(n lg n), but with a slightly larger constant hidden by the O-notation.