Selection

The Selection Problem

  • Given an integer k and n elements x1, x2, …, xn, taken from a total order, find the k-th smallest element in this set.
  • Of course, we can sort the set in O(n log n) time and then index the k-th element. Selection_Problem
  • Can we solve the selection problem faster?

Quick-Select

  • Quick-select is a randomized selection algorithm based on the prune-and-search paradigm:
    • Prune: pick a random element x (called pivot) and partition S into
      • L elements less than x
      • E elements equal x
      • G elements greater than x
    • Search: depending on k, either answer is in E, or we need to recurse in either L or G

Quick Select

Partition

  • We partition an input sequence as in the quick-sort algorithm:
    • We remove, in turn, each element y from S and
    • We insert y into L, E or G, depending on the result of the comparison with the pivot x
  • Each insertion and removal is at the beginning or at the end of a sequence, and hence takes O(1) time
  • Thus, the partition step of quick-select takes O(n) time
Algorithm partition(S, p)
  Input sequence S, position p of pivot
  Output subsequences L, E, G of the elements of S less than, equal to, or greater than the pivot, resp.
  L, E, G ← empty sequences
  x ← S.remove(p)
  while ¬S.isEmpty()
    y ← S.remove(S.first())
    if y < x
      L.insertLast(y)
    else if y = x
      E.insertLast(y)
    else { y > x }
      G.insertLast(y)
  return L, E, G

Quick-Select Visualization

  • An execution of quick-select can be visualized by a recursion path

    • Each node represents a recursive call of quick-select, and stores k and the remaining sequence

    Quick_Select_Visualization

Expected Running Time (Same as Quick-Sort)

  • Consider a recursive call of quick-sort on a sequence of size s
    • Good call: the sizes of L and G are each less than 3s/4
    • Bad call: one of L and G has size greater than 3s/4

Quick_Sort_Wort_Good_Bad_Case

  • A call is good with probability 1/2
    • 1/2 of the possible pivots cause good calls:

Quick_Sort_Wort_Good_Bad_Cases

  • Probabilistic Fact #1: The expected number of coin tosses required in order to get one head is two
  • Probabilistic Fact #2: Expectation is a linear function:
    • E(X + Y ) = E(X ) + E(Y )
    • E(cX ) = cE(X )
  • Let T(n) denote the expected running time of quick-select.
  • By Fact #2,
    • T(n) < T(3n/4) + bn*(expected # of calls before a good call)
  • By Fact #1,
    • T(n) < T(3n/4) + 2bn
  • That is, T(n) is a geometric series:
    • T(n) < 2bn + 2b(3/4)n + 2b(3/4)2n + 2b(3/4)3n + …
  • So T(n) is O(n).
  • We can solve the selection problem in O(n) expected time.

Deterministic Selection

  • We can do selection in O(n) worst-case time.
  • Main idea: recursively use the selection algorithm itself to find a good pivot for quick-select:
    • Divide S into n/5 sets of 5 each
    • Find a median in each set
    • Recursively find the median of the “baby” medians

Selection_Deterministic

  • See Exercise C-4.24 for details of analysis.

Code

package examples;


import java.lang.management.ManagementFactory;
import java.lang.management.ThreadMXBean;
import java.util.ArrayList;
import java.util.Random;


/**
 * @author ps
 * Various sort programs for int arrays
 */
public class SortTest {


    public static long cnt;
    static Random rand = new Random();
    static int [] b;

    /**
     * @param a int aray
     * @return 'true' if 'a' is sorted
     */
    public static boolean sortCheck(int[] a) {
        for (int i=0;i<a.length-1;i++){
            if (a[i]>a[i+1]) return false;
        }
        return true;
    }

    /**
     * On return the array 'a' will be chanched such that
     * the  condition helds:
     * a[0..rank-1] <= a[rank] <= a[rank+1..a.length-1]
     * @param a
     * @param rank
     */
    public static void quickSelect(int [] a , int rank){
        // on return the following condition helds:
        // a[0..rank-1] <= a[rank] <= a[rank+1..a.length-1]
        qSelect(a,0, a.length-1,rank);
    }


    private static void qSelect(int[] a, int from, int to, int rank) {
        // on return the following condition helds:
        // a[from..rank-1] <=  a[rank] <= a[rank+1..to]
        int piv = partition(a,from, to);
        if (piv==rank) return;
        if (rank > piv) qSelect(a,piv+1,to,rank);
        else qSelect(a,from,piv-1,rank);
    }

    /**
     * retuns piv and partitions the
     * range a[from..to] such that all of the elements
     * in the range a[from..piv-1] are <= a[piv] and
     * all elements in the range a[piv+1..to] are  >= a[piv]
     * @param a
     * @param from
     * @param to
     * @return the position 'piv' of the pivot
     */
    private static int partition(int []a, int from, int to){
        if (from==to) return from;
        swap(a,rand.nextInt(to-from)+from,to); // chooses a random pivot
        int pivot = a[to];
        int left = from-1;
        int right = to;
        while (true){
            while(a[++left]< pivot); // a[left] is now a candidate to swap
            while(a[--right] > pivot && right>from);
            if (left >= right) break;
            swap(a,left,right);
        }
        swap(a,left,to);
        return left;      // return the final position of the pivot (to be changed!)
    }

    /**
     * swap the array elements a[i] and a[k]
     * @param a int array
     * @param i position in the array 'a'
     * @param k position in the array 'a'
     */
    static void swap(int [] a, int i, int k){
        int tmp=a[i];
        a[i]=a[k];
        a[k]=tmp;
        cnt++;
    }

    public static void main(String[] args) {
        long t1=0,t2=0,te1=0,te2=0,eTime=0,time=0;
        int n = 50000000;
        // we need a random generator
        Random rand=new Random(Integer.MAX_VALUE);
        //rand.setSeed(54326346); // initialize always in the same state
        ThreadMXBean threadBean = ManagementFactory.getThreadMXBean();
        // new array
        int [] a = new int[n];
        // fill it randomly
        for (int i=0;i<a.length ;i++) {
            a[i]=rand.nextInt(n);
        }
        for (int i=0;i<a.length ;i++) {
            swap(a,i,rand.nextInt(n-1));
        }

        cnt=0;  // for statistcs reasons
        // get Time
        te1=System.nanoTime();
        t1 = threadBean.getCurrentThreadCpuTime();
        quickSelect(a);

        // System.out.println("heap? "+heapCheck(a));
        te2 = System.nanoTime();
        t2 = threadBean.getCurrentThreadCpuTime();
        time=t2-t1;
        eTime=te2-te1;
        System.out.println("# elements: "+n);
        System.out.println("CPU-Time usage: "+time/1000000.0+" ms");
        System.out.println("elapsed time: "+eTime/1e6+" ms");
        System.out.println("sorted? "+sortCheck(a));
        System.out.println("swap operation needed: "+cnt);
    }


}