SKEDSOFT

Design Analysis Of Algorithm

Bucket sort

Bucket sort runs in linear time when the input is drawn from a uniform distribution. Like counting sort, bucket sort is fast because it assumes something about the input. Whereas counting sort assumes that the input consists of integers in a small range, bucket sort assumes that the input is generated by a random process that distributes elements uniformly over the interval [0, 1).

The idea of bucket sort is to divide the interval [0, 1) into n equal-sized subintervals, or buckets, and then distribute the n input numbers into the buckets. Since the inputs are uniformly distributed over [0, 1), we don't expect many numbers to fall into each bucket. To produce the output, we simply sort the numbers in each bucket and then go through the buckets in order, listing the elements in each.

Our code for bucket sort assumes that the input is an n-element array A and that each element A[i] in the array satisfies 0 A[i] < 1. The code requires an auxiliary array B[0 n - 1] of linked lists (buckets) and assumes that there is a mechanism for maintaining such lists. (Section 10.2 describes how to implement basic operations on linked lists.)

	BUCKET-SORT(A)
1  n  length[A]
2  for i  1 to n
3     do insert A[i] into list B[n A[i]]
4  for i  0 to n - 1
5     do sort list B[i] with insertion sort
6  concatenate the lists B[0], B[1], . . ., B[n - 1] together in order

Figure 8.4 shows the operation of bucket sort on an input array of 10 numbers.

Figure 8.4: The operation of BUCKET-SORT. (a) The input array A[1 10]. (b) The array B[0 9] of sorted lists (buckets) after line 5 of the algorithm. Bucket i holds values in the half-open interval [i/10, (i 1)/10). The sorted output consists of a concatenation in order of the lists B[0], B[1], . . ., B[9].

To see that this algorithm works, consider two elements A[i] and A[j]. Assume without loss of generality that A[i] A[j]. Since n A[i] n A[j], element A[i] is placed either into the same bucket as A[j] or into a bucket with a lower index. If A[i] and A[j] are placed into the same bucket, then the for loop of lines 4-5 puts them into the proper order. If A[i] and A[j] are placed into different buckets, then line 6 puts them into the proper order. Therefore, bucket sort works correctly.

To analyze the running time, observe that all lines except line 5 take O(n) time in the worst case. It remains to balance the total time taken by the n calls to insertion sort in line 5.

To analyze the cost of the calls to insertion sort, let ni be the random variable denoting the number of elements placed in bucket B[i]. Since insertion sort runs in quadratic time, the running time of bucket sort is

Taking expectations of both sides and using linearity of expectation, we have

We claim that