A tour of the Python standard library shows a number of modules that are popular and in every day use and a number of others which are either underused, underappreciated or works behind the scenes powering other modules.
In this article series, I will try and discuss some of the most interesting ones.
First up is the bisect module which, given its size - provides a lot of power to the programmer in working with sorted containers.
Bisect - Binary Search and Insertion
The core of this module are four functions - two of which provide binary search into an sorted list or container, and the other two using these functions to insert a new element into the list, while maintaining the sorted order.
The first two are,
- bisect_left(a, x, lo=0, hi=len(a), key=None) - Return the insertion point in the list a for the new item x so that the sorted order is maintained. The insertion point will be to the left of any existing entries.
- bisect_right(a, x, lo=0, hi=len(a), key=None) - Return the insertion point in the list a for the new item x to the right of any existing entries so that the sorted order is maintained.
The next two are,
- insort_left(a, x, lo=0, hi=len(a), key=None) - This function uses
bisect_leftto find the insertion point for x into a and inserts it into that position, while keeping the sorted order. - insort_right(a, x, lo=0, hi=len(a), key=None) - This function uses
bisect_rightto find the insertion point for x into a and inserts it into that position, while keeping the sorted order.
NOTE: By default, the entire list is used. The params lo and hi can be used to indicate a subset of the list as lower and upper indices respectively.
The simplest example is a binary search for an existing item in a sorted list.
import bisect
def binary_search(a, x, lo=0, hi=None):
""" Search for item x in a using bisect binary search """
if hi is None:
hi = len(a)
pos = bisect.bisect_left(a, x, lo, hi)
if pos != hi and a[pos] == x:
return pos
# You may raise an exception instead
return -1
Let us try this on a sorted list.
>>> l = [10, 15, 18, 20, 21, 21, 21, 24, 24]
>>> binary_search(l, 18)
2
>>> binary_search(l, 21)
4
>>> binary_search(l, 24)
7
>>> binary_search(l, 100)
-1
NOTE: The bisect module assumes that the input list is already sorted, but doesn't raise any exception if it isn't. Hence using the bisect module's methods on unsorted list will return meaningless and incorrect results.
Let us now try the insort functions to try and insert some data at the right place.
>>> bisect.insort_left(l, 12)
>>> l
[10, 12, 15, 18, 20, 21, 21, 21, 24, 24]
>>> bisect.insort_right(l, 20)
>>> l
[10, 12, 15, 18, 20, 20, 21, 21, 21, 24, 24]
>>> bisect.insort_left(l, 17)
>>> l
[10, 12, 15, 17, 18, 20, 20, 21, 21, 21, 24, 24]
List Search Functions
We can now write some generic list search functions to return the index of an element in a sorted list using the bisect module. These functions assume the entire list is used for the search.
def search(a, x):
""" Locate the index of the leftmost value exactly equal to x """
i = bisect.bisect_left(a, x)
if i != len(a) and a[i] == x:
return i
return -1
def search_right(a, x):
""" Locate the index of the right most value exactly equal to x """
i = bisect.bisect_right(a, x)
if i and a[i-1] == x:
return i
return -1
Here are examples of these functions in action.
>>> l=[10, 12, 15, 17, 18, 20, 20, 21, 21, 21, 24, 24]
>>> search(l, 18)
4
>>> search(l, 20)
5
>>> search(l, 21)
7
>>> search_right(l, 21)
10
>>> search_right(l, 24)
12
You can find more examples of a similar nature in the Python module documentation for bisect.
It is possible to write more interesting functions which are built upon these generic functions. Here is an example of a function that returns all occurrences of an element in a list as a range - and examples of its usage.
def occurs(a, x):
""" Return all occurrences of x in a """
start = search(a, x)
end = search_right(a, x)
if start != -1 and end != -1:
return range(start, end, 1)
>>> l=[10, 12, 15, 17, 18, 20, 20, 21, 21, 21, 24, 24]
>>> occurs(l, 18)
range(4, 5)
>>> occurs(l, 20)
range(5, 7)
>>> list(occurs(l, 21))
[7, 8, 9]
Numerical Range/Table Look-ups
The bisect functions are very handy for implementing functions which need numerical range/table look-ups. For example, one can think of these as functions that create a statistical mapping from a range of numeric data to specific classes of outputs.
Common examples are class grades, ratings and/or reviews.
Here is an example of an IQ (Intelligence Quotient) mapping function written from first principles using numeric ranges.
def IQ(score):
""" IQ score -> rating """
if score < 70:
return "intellectually disabled"
elif score in range(70, 85):
return "below average"
elif score in range(85, 115):
return "average"
elif score in range(115, 130):
return "bright"
elif score in range(130, 145):
return "gifted"
elif score in range(145, 160):
return "highly gifted"
elif score in range(160, 180):
return "exceptionally gifted"
elif score > 180:
return "profoundly gifted/genius"
>>> IQ(65)
'intellectually disabled'
>>> IQ(100)
'average'
>>> IQ(125)
'bright'
>>> IQ(200)
'profoundly gifted/genius'
The function can be written in a much nicer way using the bisect module - using the idea of finding the position for an element in an existing list - and then mapping its index to a specific class.
def IQ(score):
ranges = (70, 85, 115, 130, 145, 160, 180)
rating = ("intellectually disabled",
"below average",
"average",
"bright",
"gifted",
"highly gifted",
"exceptionally gifted",
"profoundly gifted/genius")
idx = bisect.bisect_right(ranges, score)
return rating[idx]
You can verify that this gives the same results as the original function.
NOTE: Apart from a cleaner approach, this is also a faster solution since the bisect lookup works in O(log(N)) time complexity, whereas the cascading if..else logic of the original function is much slower.
A Custom Range Look-up Dictionary
One can extend the idea of the range look-up function developed above to write a custom dictionary suited for these kind of problems. The idea naturally evolves from the observation that we are doing a mapping from a set of numbers to a set of classes (strings) in the modified function above.
The code of the dictionary itself is so simple that it seems almost too good to be true.
class RangeDict(dict):
""" A dictionary supporting values close to the key """
def __getitem__(self, key):
sorted_keys = sorted(self.keys())
idx = bisect.bisect_right(sorted_keys, key)
# Mapped key is one down from index but we don't
# want to cycle using negative indices
if idx > 0: idx -= 1
mapped_key = sorted_keys[idx]
return super().__getitem__(mapped_key)
Let us see how to use this dictionary for the IQ example.
def IQ(score):
""" IQ function using range dictionary """
scores_class_dict={0: 'intellectually disabled',
70: 'below average',
85: 'average',
115: 'bright',
130: 'gifted',
145: 'highly gifted',
160: 'exceptionally gifted',
180: 'profoundly gifted/genius'
}
d = RangeDict(scores_class_dict)
return d[score]
You may observe how we had to introduce an extra class - with the key 0 - in the new scores_class_dict which we are using to initialize the range dictionary instance with. This is required because we are now converting a range of values to exact key look-ups with one-down logic, so every class kind of shifts down.
You may verify that this produces the same output as the earlier two functions.
Word Search
Since bisect functions work with sorted lists - they can work with list of words as well in a very natural way. As words are sorted in alphabetical order - searching for words in a sorted list of words by a prefix sub-string is a direct application of the binary search provided by these functions.
Here is a fully developed example of this, where we process a wordlist file, containing one word per line and then search for all words matching a certain prefix.
def search_word(filename, prefix):
""" Search a file containing a wordlist and print matching words by prefix """
wordlist = sorted(filter(None, [item.lower().strip() for item in open(filename)]))
words = []
idx = bisect.bisect_left(wordlist, prefix)
for idx in range(idx, len(wordlist)):
word = wordlist[idx]
if word.startswith(prefix):
words.append(word)
return words
On Linux systems, there is a standard word list file which can be used to test this code. Here are examples of this function in action.
>>> filename='/usr/share/dict/words'
>>> bt.search_word(filename, 'calculate')
['calculate', 'calculated', 'calculates']
>>> bt.search_word(filename, 'code')
['code', "code's", 'coded', 'codeine', "codeine's", 'codependency', "codependency's", 'codependent',
"codependent's", 'codependents', 'codes', 'codex', "codex's"]
>>> bt.search_word(filename, 'search')
['search', "search's", 'searched', 'searcher', "searcher's", 'searchers', 'searches', 'searching',
'searchingly', 'searchlight', "searchlight's", 'searchlights']
I hope these examples helped to convey that even being a tiny module, the bisect module punches above it's own weight and provides a powerful set of functions to add to any Python programmer's toolkit.
NOTE: A version of this story is available in medium