Magic of Quickselect/Selection algorithm

Recently I came across an interesting interview question. To find the kth largest or smallest element in the given array. Please remember that array is unsorted. The problem becomes cakewalk if array is sorted.

In its simplest form, if an array a of size n is already sorted, you can get kth largest or smallest element with a[n - k] or a[k - 1] respectively.

To begin with our original problem with an unsorted array, we can also sort the array and then simply fetch the element from sorted array in constant time. However, sorting for an array of size n is going to take n*log(n) in the best case, where n is the size of an input array.

Can we do better? Certainly, we can. Magic here is to use Quickselect or selection algorithm. This algorithm is similar to QuickSort, but rather than taking middle element as pivot, we make element from one end as pivot and arrange other array elements around it.

In its base case, we make 0th element as a pivot and then start from two indices. One index i starts from index 1 and other index j from n - 1. Elements beginning from 1 should be less than or equal to and elements from other end should be greater than pivot. If this condition is not met, we simply swap the elements just like regular QuickSort algorithm.

At the end we swap the position of initial pivot and put it in the middle of two groups. Once pivot is placed, all the elements on its left are less than or equal to and elements on its right are greater than it.

If the index of pivot elements is same as the desired index we are looking for, we've got our kth largest or smallest element. For example, say you want to find kth largest element and pivot sits 4 positions away from an end, that means pivot index is expected one and you will return an element at that index. Same logic can be applied for kth smallest element. In that case, if pivot sits k positions away from beginning, we've got our kth smallest element.

Thing to remember is that when you find pivot, even though elements on left or right might be less than or greater than pivot they may not be in order. Quickselect is an intermediate stage of QuickSort algorithm. This can also give you k largest or smallest elements.

Let's look at the source code,

class KthMaxMinFinder {
    func findMinMax(a: inout [Int], k: Int, low: Int, high: Int, findingMax: Bool) -> Int {
        // Check for valid input values.
        if a.count < 1 || k < 1 || k > a.count || low > high {
            return -1
        }

        // Size of an input array
        let size = a.count

        // final desired index we're looking for.
        let finalIndex = findingMax = = true ? size - k : k - 1

        // Alternatively, you can modify the above line as follows to get kth min element instead
        // let finalIndex = k - 1

        // Make the partition and find the index of result pivot element.
        let part = partition(a: &a, low: low, high: high)

        // If partition index is same as our expected final index, we've found kth max/min element.
        if part = = finalIndex {
            return a[part]
        }

        // If parition falls on the left side of desired index, we will start looking from successive partition index to the end
        if part < finalIndex {
            return findMinMax(a: &a, k: k, low: part + 1, high: high, findingMax: findingMax)
        } else {
            // If parition falls on the right side of desired index, we will start looking from beginning to the previous partition index
            return findMinMax(a: &a, k: k, low: low, high: part - 1, findingMax: findingMax)
        }
    }

    // Perform the partition for given array. Move elements such that for given pivot all elements on its left are less than or equal to and elements on right are greater than it.
    fileprivate func partition(a: inout [Int], low: Int, high: Int) -> Int {
        // Return the low index if high and low indices zero in on the same position. This is to avoid potential infinite loop.
        if high = = low {
            return low
        }

        // Start with one index next to pivot. Remember, a[low] is the pivot element.
        var i = low + 1

        // Start from the other end and move its way to the left
        var j = high

        while true {
            // Increment left index until encountered elements are less than or equal to pivot.
            while a[i] < a[low] && i < high {
                i = i + 1
            }

            // Increment right index until encountered elements are greater than the pivot.
            while a[j] > a[low] && j > low {
                j = j - 1
            }

            // If indices overlap or cross, break the loop.
            if i >= j {
                break
            }

            // Swap the elements so as to make sure all the elements on left are less than or equal to and elements on right are greater than selected pivot.
            let temp1 = a[i]
            a[i] = a[j]
            a[j] = temp1

            // Increment left index and decrement right index
            i = i + 1
            j = j - 1
        }

        // Put the pivot element at the beginning at jth position. This is the final pivot index. This element is j + 1 th smallest and n - j th largest element
        let temp = a[low]
        a[low] = a[j]
        a[j] = temp
        
        // Return the index of pivot element.
        return j
        
    }
}

The code has been sprinkled with comments for easier understanding. Let's look at the example with step by step going through the execution.


//Input
var arr = [100, 2, 1000, 3, 4, 5, 6, 10, 11]
// Say k is 4 - i.e. we want to find second largest element in an array. We will print the value returned by partition method and contents of array at each pass

Expected index = size - k 
Expected index = 9 - 4 = 5
desiredIndex = 5

We will stop processing when partition method returns 5 as a pivot index value. 

// First pass
pivotIndex = 7
array = [10, 2, 11, 3, 4, 5, 6, 100, 1000]

Since pivotIndex is greater than desiredIndex, we will begin search from position beginning to pivotIndex - 1, i.e. from 0 to 6

// Second pass
pivotIndex = 5
array = [5, 2, 6, 3, 4, 10, 11, 100, 1000]

Since pivotIndex is same as desiredIndex, we will stop here. This means pivotIndex has the desired element which is 4th largest element in the given array.

If you look at the array, there are 3 (k - 1) elements on the right side and 5 (n - k) on left side of pivot element.

If the problem is to return k largest elements in the given array, you may simply return a sub-array from position k to high, i.e. elements of an array from index 5 to 8 which constitute k largest elements in the given array. 

Time complexity


Quickselect algorithm is one of the best approaches in finding kth largest/smallest element in the given array. The reason being, based on the position of pivot element, it continuously divides the array in two partitions.

As you can see in the code example above on lines from 28to 33 we divide array into roughly two equal parts based on where the pivot index falls with respect to expected index. That means on an average for each pass we keep on reducing the size of an input array. This gives us a good idea on how to calculate time complexity of Quickselect algorithm for an array of size n.

T(n) = T(n) + T(n/2) + T(n/4) + ... + 1

This leads to the computation of sum for an infinite series.

Which is equivalent to

T(n) = n + n/2 + n/4 + ... + 1
T(n) = n * (1 / (1- (1/2)))
T(n) = O(2n) = O(kn) = O(n)

Thus it is proved that selection algorithm runs in the linear time for any arbitrary input of size n

Time to write some tests. Let's write some unit tests to verify the behavior of an algorithm.

Tests are going to be simple. We will feed an algorithm a diverse set of input arrays and compare the actual result against expected one.

import XCTest
@testable import KthMinMaxModule

class KthMinMaxModuleTests: XCTestCase {

    var kthValueFinder: KthMaxMinFinder!
    let originalArray: [Int] = [100, 2, 1000, 3, 4, 5, 6, 10, 11]

    override func setUp() {
        super.setUp()
        self.kthValueFinder = KthMaxMinFinder()
    }

    func testFindKthMax() {
        var inputArray = originalArray
        XCTAssertEqual(kthValueFinder.findMinMax(a: &inputArray, k: 4, low: 0, high: 8, findingMax: true), 10)

        inputArray = originalArray
        XCTAssertEqual(kthValueFinder.findMinMax(a: &inputArray, k: 6, low: 0, high: 8, findingMax: true), 5)

        inputArray = originalArray
        XCTAssertEqual(kthValueFinder.findMinMax(a: &inputArray, k: 1, low: 0, high: 8, findingMax: true), 1000)
    }

    func testFindKthMin() {
        var inputArray = originalArray
        XCTAssertEqual(kthValueFinder.findMinMax(a: &inputArray, k: 4, low: 0, high: 8, findingMax: false), 5)

        inputArray = originalArray
        XCTAssertEqual(kthValueFinder.findMinMax(a: &inputArray, k: 6, low: 0, high: 8, findingMax: false), 10)

        inputArray = originalArray
        XCTAssertEqual(kthValueFinder.findMinMax(a: &inputArray, k: 1, low: 0, high: 8, findingMax: false), 2)
    }

    func testInvalidValues() {

        // Empty Array as an input.
        var emptyArray: [Int] = []
        XCTAssertEqual(kthValueFinder.findMinMax(a: &emptyArray, k: 4, low: 0, high: 8, findingMax: false), -1)

        // Invalid K value - Below bound
        var inputArray = originalArray
        XCTAssertEqual(kthValueFinder.findMinMax(a: &inputArray, k: 0, low: 0, high: 8, findingMax: false), -1)

        // Invalid K value - Above bound
        inputArray = originalArray
        XCTAssertEqual(kthValueFinder.findMinMax(a: &inputArray, k: 10, low: 0, high: 8, findingMax: false), -1)
    }
}

And that should be it. Selection algorithm is very powerful algorithm and can be used for variety of purpose. Following are the use cases where it could be used to find desired output:

  • Find Kth maximum element in the array
  • Find Kth minimum element in the array
  • Find K maximum elements in the array
  • Find K minimum elements in the array

I hope this will be useful when you are preparing for competitive exams or general algorithm learning. Feel free to ping me on Twitter at @jayeshkawli if you have further questions about it. I am open to other algorithms related discussions as well. This field always excites me!

References:

Leetcode find kth largest element in an array