Bucket-Sort and Radix-Sort

Definition

  • Let be S be a sequence of n (key, element) items with keys in the range [0, N − 1]
  • Bucket-sort uses the keys as indices into an auxiliary array B of sequences (buckets)
    • Phase 1: Empty sequence S by moving each item (k, o) into its bucket B[k]
    • Phase 2: For i = 0, …, N − 1, move the items of bucket B[i] to the end of sequence S
  • Analysis:
    • Phase 1 takes O(n) time
    • Phase 2 takes O(n + N) time
    • Bucket-sort takes O(n + N) time
Algorithm bucketSort(S, N)
  Input sequence S of (key, element) items with keys in the range [0, N − 1]
  Output sequence S sorted by increasing keys
  B ← array of N empty sequences
  while ¬S.isEmpty()
    f ← S.first()
    (k, o) ← S.remove(f)
    B[k].insertLast((k, o))
  for i ← 0 to N − 1
    while ¬B[i].isEmpty()
      f ← B[i].first()
      (k, o) ← B[i].remove(f)
      S.insertLast((k, o))

Example with Key range [0, 9]

Bucket_Sort_Example

Properties and Extensions

  • Key-type Property
    • The keys are used as indices into an array and cannot be arbitrary objects
    • No external comparator
  • Stable Sort Property
    • The relative order of any two items with the same key is preserved after the execution of the algorithm
  • Extensions
    • Integer keys in the range [a, b]
      • Put item (k, o) into bucket B[k − a]
    • String keys from a set D of possible strings, where D has constant size (e.g., names of the 50 U.S. states)
      • Sort D and compute the rank r(k) of each string k of D in the sorted sequence
      • Put item (k, o) into bucket B[r(k)]

Lexicographic Order

  • A d-tuple is a sequence of d keys (k1, k2, …, kd), where key ki is said to be the i-th dimension of the tuple
  • Example:
    • The Cartesian coordinates of a point in space are a 3-tuple
  • The lexicographic order of two d-tuples is recursively defined as follows
    • (x1, x2, …, xd) < (y1, y2, …, yd) ⇔ x1 < y1 ∨ x1 = y1 ∧ (x2, …, xd) < (y2, …, yd)
      I.e., the tuples are compared by the first dimension, then by the second dimension, etc.

Lexicographic-Sort

  • Let Ci be the comparator that compares two tuples by their i-th dimension
  • Let stableSort(S, C) be a stable sorting algorithm that uses comparator C
  • Lexicographic-sort sorts a sequence of d-tuples in lexicographic order by executing d times algorithm stableSort, one per dimension
  • Lexicographic-sort runs in O(dT(n)) time, where T(n) is the running time of stableSort
Algorithm lexicographicSort(S)
  Input sequence S of d-tuples
  Output sequence S sorted in lexicographic order
  for i ← d downto 1
    stableSort(S, Ci)

Example:

(7,4,6) (5,1,5) (2,4,6) (2, 1, 4) (3, 2, 4)
(2, 1, 4) (3, 2, 4) (5,1,5) (7,4,6) (2,4,6)
(2, 1, 4) (5,1,5) (3, 2, 4) (7,4,6) (2,4,6)
(2, 1, 4) (2,4,6) (3, 2, 4) (5,1,5) (7,4,6)

Radix-Sort

  • Radix-sort is a specialization of lexicographic-sort that uses bucket-sort as the stable sorting algorithm in each dimension
  • Radix-sort is applicable to tuples where the keys in each dimension i are integers in the range [0, N − 1]
  • Radix-sort runs in time O(d( n + N))
Algorithm radixSort(S, N)
  Input sequence S of d-tuples such that (0, …, 0)(x1, …, xd) and (x1, …, xd)(N − 1, …, N − 1) for each tuple (x1, …, xd) in S
  Output sequence S sorted in lexicographic order
  for i ← d downto 1
    bucketSort(S, N)

Radix-Sort for Binary Numbers

  • Consider a sequence of n b-bit integers x = xb − 1 …x1x0
  • We represent each element as a b-tuple of integers in the range [0, 1] and apply radix-sort with N = 2
  • This application of the radix-sort algorithm runs in O(bn) time
  • For example, we can sort a sequence of 32-bit integers in linear time
Algorithm binaryRadixSort(S)
  Input sequence S of b-bit integers
  Output sequence S sorted replace each element x of S with the item (0, x)
  for i ← 0 to b − 1
    replace the key k of
      each item (k, x) of S
      with bit xi of x

Example

Radix_Sort_Example

Code

package examples;


import java.util.*;

public class IntRadixSorter {
    // bucket array

    // Lenghth of one portion in bits:
    int len = 11;
    // number of portions:
    int anz = 32 % len == 0 ? 32/len : (32/len)+1;

    // number of buckets:
    int s = (int)Math.pow(2,len);


    int [][] buckets = new int[s][];
    int [] cnts = new int[s];



    public void bsort(int [] a){
        System.out.println("sweeps:"+anz+", buckets: "+s+", KeyLength: "+len);
        // initialize the buckets
        for (int i=0; i<s;i++) {
            buckets[i] = new int[1000];
            cnts[i] = 0;
        }
        sort(a);
    }

    private void expand(int i){
        buckets[i] = Arrays.copyOf(buckets[i],buckets[i].length*2);
    }

    private void sort(int [] a){
        for (int d=0;d<anz;d++){ // loop over all portions of the int-number
            // fill the buckets
            for (int i=0;i<a.length;i++){
                int k=a[i];
                k = (k >>>len*d) % s; // k is the portion nr. d of a[i]
                // put a[i] in the corresponding buckets[k]
                if (cnts[k]>=buckets[k].length) expand(k);
                buckets[k][cnts[k]++]=a[i];
            }
            int k=0;  

            // now empty all buckets (in the right order!)
            for (int i=0;i<s;i++){
                for (int j=0;j<cnts[i];j++){
                    a[k++] = buckets[i][j];
                }
                cnts[i]=0;
            }
        }
    }

    public boolean check(int [] a){
        // returns true if a is sorted
        for (int i=0;i<a.length-1;i++){
            if (a[i]>a[i+1]) return false;
        }
        return true;
    }

    /**
     * @param args
     */
    public static void main(String[] args) {
        //        System.out.println(257 & 255);
        Random rand = new Random();
        int n = 50000000;
        int [] a = new int[n];
        for (int i=0;i<n;i++) a[i]=rand.nextInt(n);
        IntRadixSorter ibs = new IntRadixSorter();
        System.out.println(n+" elements");
        long before = System.nanoTime();
        ibs.bsort(a);
        long after = System.nanoTime();
        System.out.println("sorted? "+ibs.check(a)+", time: "+(after-before)/1e6+" msec");

    }
}