Quicksort (Python)

From LiteratePrograms

Jump to: navigation, search
Other implementations: AWK | C | C++ | Eiffel | Erlang | Forth | Haskell | Java | JavaScript | Mathematica | Mercury | Oz | Python | Python, arrays | Scala | Sed | Standard ML | Visual Basic .NET | XProc

We describe several different Python implementations of the quicksort algorithm. All of these implementations operate on lists. For alternative implementations based on Python's array standard library module see Quicksort (Python, arrays).

Contents

Implementation

Using list comprehensions

The most straightforward way to implement quicksort in Python is to use list comprehensions. This approach generates two lists, one of elements greater than or equal to the "pivot" element (in this case the first element of the list), and one of elements less than the pivot. These two lists are then recursively sorted, before being concatenated around the pivot to form the final sorted list.

<<qsort1>>=
def qsort1(list):
    """
    Quicksort using list comprehensions
    >>> qsort1<<docstring test numeric input>>
    <<docstring test numeric output>>
    >>> qsort1<<docstring test string input>>
    <<docstring test string output>>
    """
    if list == []: 
        return []
    else:
        pivot = list[0]
        lesser = qsort1([x for x in list[1:] if x < pivot])
        greater = qsort1([x for x in list[1:] if x >= pivot])
        return lesser + [pivot] + greater

This implementation has the advantage of being clear and easy to understand, yet quite compact. As it turns out, it is also faster than other list-based quicksorts in Python.

In one line of code

The implementation above can be made even more compact by combining Python's predicate if (return x if y else z) with its sequence slicing capability (e.g. list[0], list[1:]) to produce a version of quicksort in just one-line:

<<qsortr>>=
def qsortr(list):
    return [] if list==[]  else qsortr([x for x in list[1:] if x < list[0]]) + [list[0]] + qsortr([x for x in list[1:] if x >= list[0]])

The initial element of the list, list[0] becomes the pivot variable around which the remainder of the list list[1:] is filtered. When the list selected reaches size 0, the value [] (empty list) is returned and the predicate else is not invoked.

While the "one line of code" version may minimize the text of the algorithm it isn't exactly efficient: the list is fully iterated and filtered twice. It would be nice if the iteration and filtering could be done just once. This can be achieved using a partitioning function.

Using a partitioning function

An alternative to using list comprehensions is to partition the list in a single pass using a partitioning function that accumulates list items that are equal to the pivot, as well as those that are lesser and greater than the pivot. This also eliminates the need to process more than once any items that are equal to the pivot. In theory, this should result in a faster sort than the list comprehension version. However in practice it appears that the list comprehension implementation is not only faster than a single-pass partitioning approach, but approaches the performance of an in-place sort (see Quicksort (Python, arrays)).

The new implementation looks essentially the same as the earlier list comprehension implementation, except that the lesser and greater lists (along with equal) are now generated by the partition function.

<<qsort2>>=
partition function
def qsort2(list):
    """
    Quicksort using a partitioning function
    >>> qsort2<<docstring test numeric input>>
    <<docstring test numeric output>>
    >>> qsort2<<docstring test string input>>
    <<docstring test string output>>
    """
    if list == []: 
        return []
    else:
        pivot = list[0]
        lesser, equal, greater = partition(list[1:], [], [pivot], [])
        return qsort2(lesser) + equal + qsort2(greater)

The partitioning function can be thought of as a straightforward recursion. The base case has an empty list in its first argument, which corresponds to the list to be partitioned. When the list being partitioned is non-empty, we compare the head of the list to the pivot, and add the head to the appropriate list (lesser, equal, or greater) based on the outcome of these comparisons.

<<partition function (recursive)>>= 
def partition(list, l, e, g):
    if list == []:
        return (l, e, g)
    else:
        head = list[0]
        if head < e[0]:
            return partition(list[1:], l + [head], e, g)
        elif head > e[0]:
            return partition(list[1:], l, e, g + [head])
        else:
            return partition(list[1:], l, e + [head], g)

Although this is an intuitively appealing way to perform list partitioning, Python's lack of support for recursion elimination causes the recursive implementation to suffer from linear stack usage, which is not acceptable for large lists. An iterative implementation (i.e. a manual tail call optimization) solves this problem:

<<partition function>>= 
def partition(list, l, e, g):
    while list != []:
        head = list.pop(0)
        if head < e[0]:
            l = [head] + l
        elif head > e[0]:
            g = [head] + g
        else:
            e = [head] + e
    return (l, e, g)

Adding a randomized pivot

One drawback of the implementations we've presented so far is that, while they perform well on random data, they require excessive time and spaces resources on data that is already sorted or nearly-sorted. One way to improve the robustness of the quicksort in the face of this kind of data is to randomly select a pivot, instead of just making the pivot the head of the list. Fortunately, Python supports relatively fast random access to individual list elements, which makes such a pivoting scheme feasible.

The qsort1a function is essentially the same as qsort1, except that it implements randomized pivots. We make use of the pop operation to remove our chosen pivot. This has the unfortunate side effect of mutating the original list that was passed to the sort function. To prevent this unwanted mutation, we wrap the actual sorting operations in an internally defined function, qsort, to which we pass a copy of the original list. Based on timing measurements, this layer of indirection does not appear to impose any significant performance penalty.

<<qsort1a>>=
from random import randrange       
def qsort1a(list):
    """
    Quicksort using list comprehensions and randomized pivot
    >>> qsort1a<<docstring test numeric input>>
    <<docstring test numeric output>>
    >>> qsort1a<<docstring test string input>>
    <<docstring test string output>>
    """
    def qsort(list):
        if list == []: 
            return []
        else:
            pivot = list.pop(randrange(len(list)))
            lesser = qsort([l for l in list if l < pivot])
            greater = qsort([l for l in list if l >= pivot])
            return lesser + [pivot] + greater
    return qsort(list[:])

Testing

We provide a simple self-test for the quicksort implementations using the doctest module from the python standard library. This module analyses docstrings from functions defined in the module and carries out tests specified therein. We supply the tests in the form of an example interactive session with anticipated output. Here we define two tests in terms of function arguments and expected outputs — the function names and interactive session prompts will be added when these chunks of code are combined with the docstrings in which their chunk references are embedded.

<<docstring test numeric input>>=
([3, 1, 4, 1, 5, 9, 2, 6, 5, 3, 5, 8, 9, 7, 9, 3])
<<docstring test numeric output>>=
[1, 1, 2, 3, 3, 3, 4, 5, 5, 5, 6, 7, 8, 9, 9, 9]
<<docstring test string input>>=
(["bob","alice","barry","zoe","charlotte","fred","marvin"])
<<docstring test string output>>=
['alice', 'barry', 'bob', 'charlotte', 'fred', 'marvin', 'zoe']

The complete qsort.py consists of the quicksort implementations and a call to doctest to test the module:

<<qsort.py>>=
qsort1
qsort1a
qsort2
if __name__ == "__main__":
    import doctest
    doctest.testmod()

Building and running

If you have installed Python, then the quicksort implementations can be tested at the command line by issuing the following command (the -v notifies doctest to output results even if all tests pass):

python qsort.py -v

Running the tests should produce the following output:

Trying: qsort1([3, 1, 4, 1, 5, 9, 2, 6, 5, 3, 5, 8, 9, 7, 9, 3])
Expecting: [1, 1, 2, 3, 3, 3, 4, 5, 5, 5, 6, 7, 8, 9, 9, 9]
ok
Trying: qsort1(["bob","alice","barry","zoe","charlotte","fred","marvin"])
Expecting: ['alice', 'barry', 'bob', 'charlotte', 'fred', 'marvin', 'zoe']
ok
0 of 2 examples failed in __main__.qsort1.__doc__
Running __main__.qsort1a.__doc__
Trying: qsort1a([3, 1, 4, 1, 5, 9, 2, 6, 5, 3, 5, 8, 9, 7, 9, 3])
Expecting: [1, 1, 2, 3, 3, 3, 4, 5, 5, 5, 6, 7, 8, 9, 9, 9]
ok
Trying: qsort1a(["bob","alice","barry","zoe","charlotte","fred","marvin"])
Expecting: ['alice', 'barry', 'bob', 'charlotte', 'fred', 'marvin', 'zoe']
ok
0 of 2 examples failed in __main__.qsort1a.__doc__
Running __main__.qsort2.__doc__
Trying: qsort2([3, 1, 4, 1, 5, 9, 2, 6, 5, 3, 5, 8, 9, 7, 9, 3])
Expecting: [1, 1, 2, 3, 3, 3, 4, 5, 5, 5, 6, 7, 8, 9, 9, 9]
ok
Trying: qsort2(["bob","alice","barry","zoe","charlotte","fred","marvin"])
Expecting: ['alice', 'barry', 'bob', 'charlotte', 'fred', 'marvin', 'zoe']
ok
0 of 2 examples failed in __main__.qsort2.__doc__
2 items had no tests:
    __main__
    __main__.partition
3 items passed all tests:
   2 tests in __main__.qsort1
   2 tests in __main__.qsort1a
   2 tests in __main__.qsort2
6 tests in 5 items.
6 passed and 0 failed.
Test passed.

Performance

Although all of the quicksort implementations apparently work correctly, the question of performance remains. To determine the performance of the quicksorts we use a simple test driver which generates randomly populated integer lists that have a length defined via command-line arguments. The quicksort implementations are then applied to the random lists, using the timing module to obtain execution-time information.

<<time-qsort.py>>=
import sys
import random
import timing
import qsort
def checkSorted(a):
    for i in xrange(1, len(a) - 1):
        if a[i] < a[i-1]:
            return False
    return True
numElements = int(sys.argv[1])    
testlist = random.sample(xrange(999999999), numElements)
print "# elements: %d"%numElements
for q in qsort.qsort1, qsort.qsort1a, qsort.qsort2:
    print "%s"%q.__name__,
    list = testlist[:]
    timing.start()
    result = q(list)
    timing.finish()
    print "- %.3f secs"%(float(timing.micro()) / 1000000),
    if checkSorted(result):
        print "- passed"
    else:
        print "- failed"

The table below compares the performance of the quicksort implementations, for various sizes of list. Times were taken on a 1.5 GhZ PowerPC G4machine under Mac OS X using the standard Python package; we measured both with and without the -O flag, but the difference was not appreciable.

Randomized lists

List size 100 500 1000 5000 10000 50000
qsort1 (s) 0.002 0.009 0.019 0.142 0.270 1.853
qsort1a (s) 0.003 0.015 0.031 0.189 0.380 2.188
qsort2 (s) 0.004 0.029 0.097 0.846 4.161 278.242

The qsort1 list comprehension implementation significantly outperforms the qsort2 single-pass partitioning implementation for all list sizes. This is probably the result of Python's lack of tail-call optimization, and possibly also a result of optimizations for list comprehensions within the Python runtime.

The qsort1a implementation performs roughly as well as qsort1, but not surprisingly pays a slight performance penalty for generating a random number during each pivot selection.

As a further point of comparison, we compare the behavior of qsort1 and qsort1a on sorted lists of sequential integers, which is a worst-case for a naive list algorithm.

Sorted lists

List size 100 500 1000 5000 10000 50000
qsort1 (s) 0.006 0.145
qsort1a (s) 0.002 0.014 0.027 0.167 0.333 1.825

The simple comprehension-based implementation stack overflowed at 1000 elements. However, by adding a randomized pivot to the comprehension-based implementation it is possible to avoid the stack overflow, and still achieve good performance.

Download code
Views