Bucket-Sort and Radix-Sort
Definition
- Let be S be a sequence of n (key, element) items with keys in the range [0, N − 1]
- Bucket-sort uses the keys as indices into an auxiliary array B of sequences (buckets)
- Phase 1: Empty sequence S by moving each item (k, o) into its bucket B[k]
- Phase 2: For i = 0, …, N − 1, move the items of bucket B[i] to the end of sequence S
- Analysis:
- Phase 1 takes O(n) time
- Phase 2 takes O(n + N) time
- Bucket-sort takes O(n + N) time
Algorithm bucketSort(S, N)
Input sequence S of (key, element) items with keys in the range [0, N − 1]
Output sequence S sorted by increasing keys
B ← array of N empty sequences
while ¬S.isEmpty()
f ← S.first()
(k, o) ← S.remove(f)
B[k].insertLast((k, o))
for i ← 0 to N − 1
while ¬B[i].isEmpty()
f ← B[i].first()
(k, o) ← B[i].remove(f)
S.insertLast((k, o))
Example with Key range [0, 9]
Properties and Extensions
- Key-type Property
- The keys are used as indices into an array and cannot be arbitrary objects
- No external comparator
- Stable Sort Property
- The relative order of any two items with the same key is preserved after the execution of the algorithm
- Extensions
- Integer keys in the range [a, b]
- Put item (k, o) into bucket B[k − a]
- String keys from a set D of possible strings, where D has constant size (e.g., names of the 50 U.S. states)
- Sort D and compute the rank r(k) of each string k of D in the sorted sequence
- Put item (k, o) into bucket B[r(k)]
- Integer keys in the range [a, b]
Lexicographic Order
- A d-tuple is a sequence of d keys (k1, k2, …, kd), where key ki is said to be the i-th dimension of the tuple
- Example:
- The Cartesian coordinates of a point in space are a 3-tuple
- The lexicographic order of two d-tuples is recursively defined as follows
- (x1, x2, …, xd) < (y1, y2, …, yd) ⇔ x1 < y1 ∨ x1 = y1 ∧ (x2, …, xd) < (y2, …, yd)
I.e., the tuples are compared by the first dimension, then by the second dimension, etc.
- (x1, x2, …, xd) < (y1, y2, …, yd) ⇔ x1 < y1 ∨ x1 = y1 ∧ (x2, …, xd) < (y2, …, yd)
Lexicographic-Sort
- Let Ci be the comparator that compares two tuples by their i-th dimension
- Let stableSort(S, C) be a stable sorting algorithm that uses comparator C
- Lexicographic-sort sorts a sequence of d-tuples in lexicographic order by executing d times algorithm stableSort, one per dimension
- Lexicographic-sort runs in O(dT(n)) time, where T(n) is the running time of stableSort
Algorithm lexicographicSort(S)
Input sequence S of d-tuples
Output sequence S sorted in lexicographic order
for i ← d downto 1
stableSort(S, Ci)
Example:
(7,4,6) (5,1,5) (2,4,6) (2, 1, 4) (3, 2, 4)
(2, 1, 4) (3, 2, 4) (5,1,5) (7,4,6) (2,4,6)
(2, 1, 4) (5,1,5) (3, 2, 4) (7,4,6) (2,4,6)
(2, 1, 4) (2,4,6) (3, 2, 4) (5,1,5) (7,4,6)
Radix-Sort
- Radix-sort is a specialization of lexicographic-sort that uses bucket-sort as the stable sorting algorithm in each dimension
- Radix-sort is applicable to tuples where the keys in each dimension i are integers in the range [0, N − 1]
- Radix-sort runs in time O(d( n + N))
Algorithm radixSort(S, N)
Input sequence S of d-tuples such that (0, …, 0) ≤ (x1, …, xd) and (x1, …, xd) ≤ (N − 1, …, N − 1) for each tuple (x1, …, xd) in S
Output sequence S sorted in lexicographic order
for i ← d downto 1
bucketSort(S, N)
Radix-Sort for Binary Numbers
- Consider a sequence of n b-bit integers x = xb − 1 …x1x0
- We represent each element as a b-tuple of integers in the range [0, 1] and apply radix-sort with N = 2
- This application of the radix-sort algorithm runs in O(bn) time
- For example, we can sort a sequence of 32-bit integers in linear time
Algorithm binaryRadixSort(S)
Input sequence S of b-bit integers
Output sequence S sorted replace each element x of S with the item (0, x)
for i ← 0 to b − 1
replace the key k of
each item (k, x) of S
with bit xi of x
Example
Code
package examples;
import java.util.*;
public class IntRadixSorter {
// bucket array
// Lenghth of one portion in bits:
int len = 11;
// number of portions:
int anz = 32 % len == 0 ? 32/len : (32/len)+1;
// number of buckets:
int s = (int)Math.pow(2,len);
int [][] buckets = new int[s][];
int [] cnts = new int[s];
public void bsort(int [] a){
System.out.println("sweeps:"+anz+", buckets: "+s+", KeyLength: "+len);
// initialize the buckets
for (int i=0; i<s;i++) {
buckets[i] = new int[1000];
cnts[i] = 0;
}
sort(a);
}
private void expand(int i){
buckets[i] = Arrays.copyOf(buckets[i],buckets[i].length*2);
}
private void sort(int [] a){
for (int d=0;d<anz;d++){ // loop over all portions of the int-number
// fill the buckets
for (int i=0;i<a.length;i++){
int k=a[i];
k = (k >>>len*d) % s; // k is the portion nr. d of a[i]
// put a[i] in the corresponding buckets[k]
if (cnts[k]>=buckets[k].length) expand(k);
buckets[k][cnts[k]++]=a[i];
}
int k=0;
// now empty all buckets (in the right order!)
for (int i=0;i<s;i++){
for (int j=0;j<cnts[i];j++){
a[k++] = buckets[i][j];
}
cnts[i]=0;
}
}
}
public boolean check(int [] a){
// returns true if a is sorted
for (int i=0;i<a.length-1;i++){
if (a[i]>a[i+1]) return false;
}
return true;
}
/**
* @param args
*/
public static void main(String[] args) {
// System.out.println(257 & 255);
Random rand = new Random();
int n = 50000000;
int [] a = new int[n];
for (int i=0;i<n;i++) a[i]=rand.nextInt(n);
IntRadixSorter ibs = new IntRadixSorter();
System.out.println(n+" elements");
long before = System.nanoTime();
ibs.bsort(a);
long after = System.nanoTime();
System.out.println("sorted? "+ibs.check(a)+", time: "+(after-before)/1e6+" msec");
}
}