BishopPhillips Consulting - Python Algorithms & Data Structures

Python Techniques - Python Algorithms & Data Structures

Classic Algorithms & Data Structures.

Author: Jonathan Bishop
AI Revolution

The mastery of algorithms and data structures is a fundamental skill of the software engineer. While we tend to use the built in implementations in many languages, there are often times when specific solutions are not implemented in standard libraries. Even when they are understanding how a classic algorithm works is critical to debugging behaviour or choosing the right or best approach.

This paper explores are group of classic algorithms and datastructures and provides working implementations that can be used as a pattern to make more powerful or complex versions.

1. Two Pointer

Exploring the two pointer technique

"Two pointers" is an algorithm for solving problems involving arrays or lists by utilising two pointers to traverse the data structure. It aims to achieve linear-time or O(1) solutions for problems that might otherwise seem quadratic. It involves moving left & right pointers to process or compare elements over a sorted array from both ends.

It is commonly used for solutions involving:

Finding pairs of elements that meet certain conditions
Reversing arrays
Substring searching
Merging sorted arrays

Two-pointer solutions typically involve:

Two indices or iterators moving through the same or related data structure
Independent control: Each pointer moves based on logic, not in lockstep
Linear-time goals: The structure of the algorithm aims to reduce time complexity to O(n)
In-place updates or space efficiency: Often used to avoid additional memory

Key markers to look for to recognise the algorithm in code:

Marker	Description	Example in Action
left, right pointers	Variables initialized at opposite ends of an array	Reversing, partitioning
Movement toward each other	Pointers meet in the middle based on conditions	Palindrome checks, target sum
One slow, one fast pointer	Useful for detection (e.g. cycle in linked list)	Floyd’s tortoise and hare
Two iterators over sorted arrays	Merging, difference comparisons	Merge sorted arrays, union/intersection
read / write pointer pair	One scans data, the other rebuilds result	Remove duplicates, string compression

Mental cues that suggest “Two-Pointers” might work

Data is sorted or can be sorted in a traversable sort order
You're dealing with arrays or linked lists
You need to compare, filter, or merge elements in sequence
The solution seems brute-force but patterns emerge when scanning from both ends
You want to do something in-place and need to control overwrite positions

You can think of the two pointers algorithm as a coordinated dance: one pointer may scan while the other may write, or both may move towards a central rendezvous, or move at different speeds to detect patterns.

Two-pointer: Find Pairs with Sum X(sorted array)

Two Pointer: Efficient O(n) search after sorting

Given an array of numbers and a target, we can find the first pair that matches the target as follows:


def find_pair_with_sum(nums, target):
    nums.sort()  # Ensure the list is sorted
    left, right = 0, len(nums) - 1

    while left < right:
        s = nums[left] + nums[right]
        if s == target:
            return (nums[left], nums[right])
        elif s < target:
            left += 1
        else:
            right -= 1
    return None

print(find_pair_with_sum([1, 2, 2, 3, 4, 4, 5], 6))
#output: (1,5)

If we want to find all matching pairs we can rewrite this algorithm as a generator:


def generate_pairs_with_sum(nums, target):
    nums.sort()
    left, right = 0, len(nums) - 1

    while left < right:
        s = nums[left] + nums[right]
        if s == target:
            yield (nums[left], nums[right])

            # Skip duplicates
            left_val, right_val = nums[left], nums[right]
            while left < right and nums[left] == left_val:
                left += 1
            while left < right and nums[right] == right_val:
                right -= 1

        elif s < target:
            left += 1
        else:
            right -= 1
            
for pair in generate_pairs_with_sum([1, 2, 2, 3, 4, 4, 5], 6):
    print(pair)

#output: 
#(1, 5)
#(2, 4)

Two-pointer: Reverse in place

If we want to reverse an array or list "in place" (meaning: without duplicating the list) in O(1) time and O(1) space, we would use a two pointer solution:


def reverse_in_place(arr):
    left, right = 0, len(arr) - 1
    while left < right:
        arr[left], arr[right] = arr[right], arr[left]
        left += 1
        right -= 1

arr1 = [1, 2, 2, 3, 4, 4, 5]
reverse_in_place(arr1)
print(arr1)

#output: [5,4,4,3,2,2,1]

As a contrast we note the non-inplace array slicing alternative:


#Array slicing - not in place, not two pointer
arr1 = [1, 2, 2, 3, 4, 4, 5]
print(arr1[::-1])

#output: [5,4,4,3,2,2,1]

Two-pointer: Remove Duplicates from Sorted Arrays

If we want to remove duplicates from a sorted array or list "in place" (meaning: without duplicating the list) in O(1) time and O(1) space, we would use a two pointer solution where one pointer reads and the other writes:


def remove_duplicates(nums):
    if not nums:
        return 0
    write = 1
    for read in range(1, len(nums)):
        if nums[read] != nums[read - 1]:
            nums[write] = nums[read]
            write += 1
    return write  # new length or just return nums[:write] for the array slice

arr1 = [1, 2, 2, 3, 4, 4, 5]
print(arr1[:remove_duplicates(arr1)])

#output: [1,2,3,4,5]

While it is tempting to try and do this as a list comprehension, that won't work as an "in place" solution because the comprehension requires the source to be immutable. A solution with copying is doable, but that is not an "in place" nor "two pointer" deduplication:


#List Comprehension: NOT in place, NOT two pointer
def remove_duplicates_new(nums):
    return [nums[i] for i in range(len(nums)) if i == 0 or nums[i] != nums[i - 1]]

arr1 = [1, 2, 2, 3, 4, 4, 5]
print(remove_duplicates_new(arr1))

#output: [1,2,3,4,5]

Two-pointer: Merge two Sorted Arrays

If we want to merge two sorted arrays or lists a two pointer solution works nicely. This solution could be adapted to work "in place" (meaning: without duplicating the list) in O(1) time and O(1) space if one of the buffers has space. Here we will copy the merged array into a new array:


def merge_sorted_arrays(A, B):
    i, j = 0, 0
    merged = []
    while i < len(A) and j < len(B):
        if A[i] < B[j]:
            merged.append(A[i])
            i += 1
        else:
            merged.append(B[j])
            j += 1
    merged.extend(A[i:])
    merged.extend(B[j:])
    return merged

arr1 = [1, 4, 4, 5, 8]
arr2 = [2, 3, 4, 4, 5,6,7]
print(merge_sorted_arrays(arr1,arr2))

#output: [1, 2, 3, 4, 4, 4, 4, 5, 5, 6, 7, 8]

As you can see the above solution retains duplicates, which may often be the solution you require, but now let's assume we wanted to merge two sorted arrays and remove duplicates:


def merge_sorted_unique(A, B):
    i, j = 0, 0
    merged = []
    prev = None  # Tracks the last value appended

    while i < len(A) and j < len(B):
        if A[i] < B[j]:
            val = A[i]
            i += 1
        elif A[i] > B[j]:
            val = B[j]
            j += 1
        else:
            val = A[i]  # Equal case—only add once
            i += 1
            j += 1

        if val != prev:
            merged.append(val)
            prev = val

    # Handle remaining elements in A
    while i < len(A):
        if A[i] != prev:
            merged.append(A[i])
            prev = A[i]
        i += 1

    # Handle remaining elements in B
    while j < len(B):
        if B[j] != prev:
            merged.append(B[j])
            prev = B[j]
        j += 1

    return merged

arr1 = [1, 4, 4, 5, 8]
arr2 = [2, 3, 4, 4, 5,6,7]
print(merge_sorted_unique(arr1,arr2))

#output: [1, 2, 3, 4, 5, 6, 7, 8]

Note that in the non-duplicate version we can't just extend the arrays by the length of the remaining elements like we did in the first duplicating solution as we need to check each array to eliminate duplicates

On a last note here, as we are not working "in place", we could rewrite the last solution as a list comprehension. Unfortunately, while the two pointer solution was O(n + m), the comprehension solution would be O(n + m) log(n + m) because of the merging sort at the start. And no, there is no way that I can think of to write a comprehension that traverses the two arrays simultaneously eliminating the pre-sort:


#Merge two sorted arrays as a comprehension:
def merge_sorted_unique_comprehension(A, B):
    merged = sorted(A + B)
    return [
        merged[i]
        for i in range(len(merged))
        if i == 0 or merged[i] != merged[i - 1]
    ]

arr1 = [1, 4, 4, 5, 8]
arr2 = [2, 3, 4, 4, 5,6,7]
print(merge_sorted_unique_comprehension(arr1,arr2))

#output: [1, 2, 3, 4, 5, 6, 7, 8]

It would be possible to turn the two pointer solution into a generator and then build the list by iterating the generator, but there gain here would be lazy eval and memory saving rather than code complexity reduction, and is perhaps pointless if you are just generating a list as in this example, but where the items are being consumed in merged order other than just making a list the generator solution would have merit:


#Two pointer merge of two sorted arrays as a generator:
def merge_unique_twopointer_gen(A, B):
    i, j = 0, 0
    prev = None
    while i < len(A) and j < len(B):
        if A[i] < B[j]:
            val = A[i]
            i += 1
        elif A[i] > B[j]:
            val = B[j]
            j += 1
        else:
            val = A[i]
            i += 1
            j += 1
        if val != prev:
            yield val
            prev = val
    while i < len(A):
        if A[i] != prev:
            yield A[i]
            prev = A[i]
        i += 1
    while j < len(B):
        if B[j] != prev:
            yield B[j]
            prev = B[j]
        j += 1

arr1 = [1, 4, 4, 5, 8]
arr2 = [2, 3, 4, 4, 5,6,7]
print([x for x in merge_unique_twopointer_gen(arr1, arr2)])

#output: [1, 2, 3, 4, 5, 6, 7, 8]

As a very last comment on this problem, it is worth noting that Python offers other ways of approaching this problem, which are also not two pointer solutions, but very code minimising. One such is using itertools.groupby & heapq.merge(), which also deliver a list comprehension single line solution and thus also a generator solution (see my articles on comprehensions or generators to understand why).


from heapq import merge
from itertools import groupby

def merge_sorted_unique_heapq_itertool(A, B):
    return [key for key, _ in groupby(merge(A, B))]

arr1 = [1, 4, 4, 5, 8]
arr2 = [2, 3, 4, 4, 5,6,7]
print(merge_sorted_unique_heapq_itertool(arr1, arr2))

#output: [1, 2, 3, 4, 5, 6, 7, 8]

Two-pointer: Partition Arrays by Predicate

In this example we are going to partition an array "in place", putting the even numbers on the left and the odd numbers on the right. Obviously the predicate (rule) for the partition could be anything that distinguishes the members. We have just chosen evens and odds for convenience.


#Partition array into evens and odds
def partition_array(nums):
    left, right = 0, len(nums) - 1
    while left < right:
        if nums[left] % 2 == 0:
            left += 1
        elif nums[right] % 2 == 1:
            right -= 1
        else:
            nums[left], nums[right] = nums[right], nums[left]
    return nums

arr1 = [1, 2, 3, 4, 5, 6, 7, 8]
print(partition_array(arr1))

#output: [8, 2, 6, 4, 5, 3, 7, 1]

Now if we wanted to generalise this solution to accept a predicate passed in it would look like this:


#Partition array based on generalised predicate
def partition_array(nums, predicate):
    left, right = 0, len(nums) - 1
    while left < right:
        if predicate(nums[left]):
            left += 1
        elif not predicate(nums[right]):
            right -= 1
        else:
            nums[left], nums[right] = nums[right], nums[left]
    return nums

arr1 = [1, 2, 3, 4, 5, 6, 7, 8]
print(partition_array(arr1, lambda x: x % 2 == 0))

# Output (order not guaranteed): [8, 2, 6, 4, 5, 3, 7, 1]

By way of example we could use other predicates:


	lambda x: x < 0              #negatives first
	lambda x: isinstance(x, str) #strings before everything else
	lambda x: x in primes        #primes first

The two pointer solution to the partitioning problem is "not stable" in that it does not preserve the orderbut does work "in place" and O(n). If we are willing to trade away the "in place" and accept O(2n) component we can achieve a stable result without a two pointer solution using a list comprehension that returns a new array:


#List comprehension solution
def stable_partition(nums, predicate):
    return [x for x in nums if predicate(x)] + [x for x in nums if not predicate(x)]

arr1 = [1, 2, 3, 4, 5, 6, 7, 8]
print(partition_array(arr1, lambda x: x % 2 == 0))

# Output (order guaranteed): [2, 4, 6, 8, 1, 3, 5, 7]

In this specific case (of odds/evens partition) it would also be possible to generate a new buffer equal in size to the input buffer and create the partition in O(n) time while preserving order, but that solution would not be generalisable as it relies on knowing the size of the left partition ahead of time - which only works for the specific example input & predicate we are using here, so we won't illustrate that now.

2. Binary Search

Exploring the binary search technique

Binary search is a simple algorithm for searching a sorted array or list. It works in O(log n) time and works by successively halving the search space to find a target or optimum value. It is much faster than a brute force sequential search that works in O(n) time.

The classic use case is for problems that require:

Finding a boundary or threshold
Finding the the first or last occurence of some value
Finding a value in a sorted array or index
Establishing that a value is NOt present in a sorted array

Iterative binary search - find target

This is the classic iterative binary search. It returns the index of the target or -1 if not found. The basic algorithm is to halve the search space each iteration and determine whether the target is above or below the halfway point, and then halve that space and determine whether the target is above or below the new halfway point, and continue until the target is located or proven to be missing.


#Iterative binary search
def binary_search_iter(nums, target):
    left, right = 0, len(nums) - 1

    while left <= right:
        mid = (left + right) // 2  # Avoid (left + right) overflow in other languages
        if nums[mid] == target:
            return mid
        elif nums[mid] < target:
            left = mid + 1
        else:
            right = mid - 1

    return -1  # Not found

arr1 = [1, 2, 3, 4, 5, 6, 8, 9, 10]
print(binary_search_iter(arr1, 6))
print(binary_search_iter(arr1, 7))

# Output: 5  -1

Recursive binary search - find target

The algorithm of the recursive version is essentially the same as the iterative version except that it recurses. It effectively creates a search tree in the stack, but while recursions are elegant note that Python has a recursion limit (as do all languages) and no tail call optimization—so iterative is safer for large arrays.


#Recursive binary search
def binary_search_recursive(nums, target, left=0, right=None):
    if right is None:
        right = len(nums) - 1
    if left > right:
        return -1

    mid = (left + right) // 2
    if nums[mid] == target:
        return mid
    elif nums[mid] < target:
        return binary_search_recursive(nums, target, mid + 1, right)
    else:
        return binary_search_recursive(nums, target, left, mid - 1)

arr1 = [1, 2, 3, 4, 5, 6, 8, 9, 10]
print(binary_search_recursive(arr1, 6))
print(binary_search_recursive(arr1, 7))

# Output: 5  -1

Iterative binary search - find lower bound/first occurence

This version of the iterative binary seacrh is designed to determine the first index where nums[i]>=target, and is classically relevant to determining the correct insertion point for a value, where order must be preserved. Again we are returning the index into the array:


#Iterative binary lower bound search
def lower_bound(nums, target):
    left, right = 0, len(nums)
    while left < right:
        mid = (left + right) // 2
        if nums[mid] < target:
            left = mid + 1
        else:
            right = mid
    return left

arr1 = [1, 2, 3, 4, 5, 6, 8, 9, 10]
print(lower_bound(arr1, 6))
print(lower_bound(arr1, 7))

# Output: 5  6

Iterative binary search - search in a rotated array

This version of the iterative binary seacrh is designed to determine the first index where nums[i]==target, but it works on a rotated array. A rotated array is a sorted array that has been “rotated” around a pivot point—so part of it has been shifted to the other end while still preserving the original ordering of elements relative to each other. Rotated arrays appear in circular (real-time) buffers, shifted datasets (like "year rollover").


#Iterative binary rotated array search
def is_rotated(arr):
    return any(arr[i] > arr[i+1] for i in range(len(arr)-1))

def search_rotated(nums, target):
    left, right = 0, len(nums) - 1
    while left <= right:
        mid = (left + right) // 2
        if nums[mid] == target:
            return mid
        if nums[left] <= nums[mid]:
            if nums[left] <= target < nums[mid]:
                right = mid - 1
            else:
                left = mid + 1
        else:
            if nums[mid] < target <= nums[right]:
                left = mid + 1
            else:
                right = mid - 1
    return -1

arr1 = [6, 8, 9, 10, 1, 2, 3, 4, 5, ]
if is_rotated(arr1):
    print(search_rotated(arr1, 3))
    print(search_rotated(arr1, 7))

# Output: 6  -1

In this scenario the middle point is unknown at the start of the algorithm, so the first objective is to find the mid point that is relavant to the target. We start with testing the mid as halfway between the start and end of the array and check whether our target is at the mid point. If no, we check whether the leftmost number is less than the mid number. If yes (we'll call this scenario 1) then this is the normal scenario and we then test whether the target is between the left and middle, if yes then we have found the range to search and the mid becomes the right edge, if no then it is the other range and the left becomes the mid point. If the answer to the scenario 1 test was no, (we'll call this scenario 2) then we assume the range we are after is the higher range above the mid point and check to see if the target is between the middle and the right edge. If yes we have found the range to search and set the left to the mid, but if no we set the right to the mid. We repeat this process until the mid equals the target, or the left and right equal or cross eachother.

The is_rotated(arr1) function is not actually required. It is just there to show a scenario where you want to test whether the array is rotated before using the rotated binary search routine.

Iterative binary search - search for a square root, non array

Binary search doesn’t have to stay discrete, nor actually work on an array! In this example we use a binary search algorithm to find the square root iof a number. To work the binary search just needs the members being searched to be in an actual or implied ascending (or descending) order.


#Iterative binary square root non-array search
def sqrt_binary(n, eps=1e-6):
    left, right = 0, n
    while right - left > eps:
        mid = (left + right) / 2
        if mid * mid < n:
            left = mid
        else:
            right = mid
    return round((left + right) / 2, 6)


print(sqrt_binary(25))

# Output: 5

The routine will return a decimal correct to 6 decimal places as the target of the square number. It runs until the difference between the left and right positions is less than the smallest number greater than 0 that is able to be represented, or to whatever you set eps. With each iteration it calculates the square of the mid point in the current range, and moves the range based on whether that square is less than or greater than n. If the square is less than we move the range to so left becomes mid and right remains the same, else we set the range to the previous left and make right the last mid. With each iteration the range becomes smaller until the is an immeasurable difference between the left and the right, where immeasurability is determined as the range being smaller then eps.

Iterative binary search minimize maximum binary - non array

Here we have anpther binary search that doesn't need work with an array! In this example we use a binary search algorithm to find the optimal placement of something given a set of conditions that must be satisfied. For example, we have a function "can_I_do_this()" that returns True or False if a set of conditions are satisfied, and we want to find the smallest value of x between a lo and a hi value that satifies those conditions. We can use a binary search to find this x. Here is the general pattern:


#Iterative binary minimize maximum binary search - non array
#PATTERN - See below for application
def can_I_do_this( testme)
	  return True
	
def minimize_threshold(can_do, lo, hi):
    while lo < hi:
        mid = (lo + hi) // 2
        if can_do(mid):
            hi = mid
        else:
            lo = mid + 1
    return lo

print(minimize_threshold(can_I_do_this, lo, hi))

So lets look at some examples of this pattern in use.

Aggressive Cows (used for antenna & warehouse placement type problems)

A classic problem is known as "aggressive cows": Assume an array of stalls, some of which are occupied, some not. We are given a set of k aggressive cows that don't like each other plus a list of available stalls and we have to find the optimum placement of the cows to maximise the minimum distance between them in the remaining available stalls. The problem requires that we report the minimum maximised distance. So the question is "What’s the largest value of minimum distance I can guarantee between cows if I place them greedily?" For fun we will add that we also want the stall positions as a second solution.

While this problem is presented in a humorous context, it models an issue that comes up in real world problems such as antenna placement, server allocation or warehouse distribution, etc.


#Aggressive cows solution - 2 solutions, one with min max distance & two distance & placement 
#Test the criteria - check we can place k cows in the stalls separated by min_dist
#Whilever we can place the cows, the calling routine will try and increase the distance and test again
def can_place_cows(stalls, k, min_dist):
    count = 1
    last_pos = stalls[0]
    for pos in stalls[1:]:
        if pos - last_pos >= min_dist:
            count += 1
            last_pos = pos
        if count == k:
            return True
    return False

#Solution 1. Calculate the optimum distance
#Commencing by taking half the full range, we test the criteria and if successful 
#we try with try a larger separation by taking the halway point of the upper range
#else we take the halfway point of the lower range until the low and high collide.
def aggressive_cows(stalls, k):
    stalls.sort()
    low = 0
    high = stalls[-1] - stalls[0]
    result = 0

    while low <= high:
        mid = (low + high) // 2
        if can_place_cows(stalls, k, mid):
            result = mid      # Try a larger minimum distance
            low = mid + 1
        else:
            high = mid - 1

    return result

# Load and test the cow placement in the stalls
def get_cow_placements(stalls, k, min_dist):
    placements = [stalls[0]]
    last_pos = stalls[0]

    for pos in stalls[1:]:
        if pos - last_pos >= min_dist:
            placements.append(pos)
            last_pos = pos
            if len(placements) == k:
                break

    return placements if len(placements) == k else []

#Or Solution 2. Do it all in one step:
def aggressive_cows_with_placement(stalls, k):
    stalls.sort()
    low, high = 0, stalls[-1] - stalls[0]
    best_dist = 0

    while low <= high:
        mid = (low + high) // 2
        if get_cow_placements(stalls, k, mid):
            best_dist = mid
            low = mid + 1
        else:
            high = mid - 1

    return (best_dist, get_cow_placements(stalls, k, best_dist))


stalls = [1, 2, 4, 8, 9, 11, 14, 15]
k = 5
min_dist = 0
print(f"Sltn 1. Min distance between cows: {(min_dist := aggressive_cows(stalls, k))}")  # Output: 3
print(f"Sltn 1. Stalls: {get_cow_placements(stalls, k, min_dist)}") # Output: [1, 4, 8, 11, 14]
print(f"Sltn 2. Separation & Stalls: {aggressive_cows_with_placement(stalls, k)}") # Output: (3, [1, 4, 8, 11, 14])

The solution works by commencing by taking half the full range of available stalls. Using this as the separation we test the criteria by attempting to place k cows into the stalls separated by our first guess and commencing with the first available stall. If successful we try with try a larger separation by taking the halway point of the upper range else we take the halfway point of the lower range and test again. We repeat this cycle until the low and high of the range collide.

This solution demonstrates "greedy validation" - place the cows in the earliest valid stall then check if it holds true for k cows and demonstrates using a binary search on the answer when searching for a feasible value rather than an item in a list, and lastly optimisation under constraints.

As a last comment, I will note why the cow placement for-loop (which generates a list) can not be turned into a comprehension. Comprehensions aren't designed to carry forward evolving state across iterations, and here each placement depends on the last placed position with the last_pos being updated conditionally and finally we need to short-circuit the entire thing once k cows are placed. So, sorry, no list comprehension is possible here as far as I can see. Comprehension are not designed to be used for bounded, state-dependent sequence with breaking conditions.

Minimum Maximum Page Allocation (used for workload balancing across processors, etc)

In the minimum maximum page problem we have k students and an array of pages where each element represents the number of pages in a book. The goal is to assign contiguous books to each student such that the maximum number of pages assigned to any student is minimised. As a constraint a book can not be broken up, so the minimum number of pages allocated is the maximum book.

In this problem we are not searching through the array, we are searching for the smallest value of the maximum pages such that the allocation is possible and as balanced as can be achieved. This is not you standard binary search on data, but a binary search on the value space. It demonstrates both binary search and greedy validation.

In the real world we would use this algorithm for allocating workloads across multiple processors or teams, so again while presented as a hypothetical scenario here the algorithm has real world application.

As before we will deliver this solution in two parts, one that calculates the workload, and the other that both calculates the workload and distributes the pages. Obviously in a work-balancing scenario this latter version is more useful.


#Test whether the page allocation is feasible by verifying that with the current max pages allowed
#we do not require a number of students that exceeds the number of students available 
def is_feasible(pages, k, max_allowed):
    count = 1  # students
    current_sum = 0
    
    #Step through all the books of pages, allocating to them students
    for page in pages:
        if page > max_allowed:   #Test that no book is larger than our current max pages
            return False         #one book too big
            
        #Test that the current number of pages for the current student is below the max allowed
        if current_sum + page > max_allowed:  
            count += 1           #Move to the next student
            current_sum = page
            if count > k:        # Check we haven't run out of students
                return False
        else:
            current_sum += page  # Add another book to the current student

    return True

#Use a binary search to find the minimum number of pages that can be allocated
#while not breaching the constraint of each book being indivisible and all books allocated across as many
# students as possible.  The range we are working with at the start is from one book per student (which 
# therefore means the largest book is the lowest maximum pages per student) to a maximum of all books
# to one student.  This is our starting search range. 
# As always we begin by finding the mid-point between these two extremes and testing that, deciding
# whether the increase or decrease the number by 50% based on success or failure of the mid point each
# time, and moving the low and high closer together with each iteration until they collide.
# Remember we are trying to minimise the individual workload so we are searching for the minimum
# allocation for each student. 
def allocate_books(pages, k):
    if len(pages) < k:
        return -1  # Not enough books

    low = max(pages)           # At least one book per student
    high = sum(pages)          # One student does all

    result = high

    while low <= high:
        mid = (low + high) // 2
        if is_feasible(pages, k, mid):
            result = mid       # Remember we are trying to MINIMISE the work load
            high = mid - 1     # Try better (smaller max)
        else:
            low = mid + 1      # Too small, need more allowance

    return result

#This is the same as the previous solution only now we are recording the actual allocation of
#books per student.  Note we have defined an internal modified version of the feasibility test
#which remembers the allocation.
def allocate_books_with_assignment(pages, k):
    if len(pages) < k:
        return -1, []  # Not enough books

    def is_feasible(pages, k, max_pages):
        alloc = []
        current = []
        total = 0

        for page in pages:
            if page > max_pages:
                return False, []
            if total + page > max_pages:
                alloc.append(current)
                current = [page]
                total = page
                k -= 1
                if k == 0:
                    return False, []
            else:
                current.append(page)
                total += page
        alloc.append(current)
        return True, alloc

    low, high = max(pages), sum(pages)
    best_allocation = []
    result = high

    while low <= high:
        mid = (low + high) // 2
        feasible, allocation = is_feasible(pages, k, mid)
        if feasible:
            result = mid
            best_allocation = allocation
            high = mid - 1
        else:
            low = mid + 1

    return result, best_allocation



pages = [10, 20, 30, 40]
k = 2
print(allocate_books(pages, k))  #Output: 60

pages = [10, 20, 30, 40, 50, 60]
k = 3
max_pages, allocation = allocate_books_with_assignment(pages, k)
print(f"Minimum Max Pages: {max_pages}")  
print("Book assignments per student:")    
for i, group in enumerate(allocation, 1):
    print(f"Student {i}: {group}")         

#Output:
#Min_Max Pps: 90 
#S1 [10, 20, 30]
#S2 [40, 50]
#S3 [60]

We are trying to minimise the individual workload for each student so we are searching for the minimum page allocation for each student. If we find a feasible solution we would aim to find a better solution with a smaller maximum number of pages per student. We use a binary search to find this minimum number of pages that can be allocated while not breaching the constraint of each book being indivisible and all books allocated across as many students as possible.

The range we are working with at the start is from one book per student (which therefore means the largest book is the lowest maximum pages per student) to a maximum of all books to one student. This is our starting search range. As always we begin by finding the mid-point between these two extremes and testing that, deciding whether to increase or decrease the number by 50% based on success or failure of the mid point each time, and moving the low and high closer together with each iteration until they collide. If an allocation is feasible, we try again by decreasing the number of pages allocated, until we need more students than we have available in which case that max pages will fail the feasibility test.

Binary search - split array largest sum (load balancing of divisible workloads)

Our last binary search variant is the "split array largest sum" problem. This may be the most interesting of the variants as it has a very wide potential use. We will look at those real world uses in a second, but first let's describe the problem

We are given an array of non negative numbers we will call "nums" and the number of sub-arrays we want to create we will call "k". Our challenge is to split the array nums into k contiguous subarrays such theh the largest subarray sum is minimised.

We therefore need to return the minimum possible value of the largest sum across all possible valid splits.

This pattern applies in the real-world when slicing ordered, weight bearing tasks into parts and we care about load fairness or completion time:

Workload balancing in distributed systems: Distribute a set of tasks across k servers where each task has a fixed cost - file size, compute load, etc. We need to split the work across k servers to minimise the work of any one server
Cloud storage: Break an archive (log files, time series events, etc) into k chunks such that chunks are contgous and the largest shard is as small as possible to minimise the cost of sharded storage priced by size tier.
Parrallel processing pipelines: For GPU or thread based parallelism we might have a list of jobs of known complexity or duration and we want to split the jobs into k batches such that all threads complete around the same time and none are starved.
Video encoding & Media segmentation: When splitting a long video or audio file into k segments to process in parallel or for adaptive streaming, we will need the segments to be contiguous and no single process/comms process is stuck with a huge chunk.
Education or testing: A little more mundain, but splitting a workbook, slide deck or code review into k chapters/sessions while balancing the difficulty or effort across teams.


#Binary search - split array largest sum
def can_split(nums, k, max_allowed_sum):
    count = 1
    current_sum = 0

    for num in nums:
        if current_sum + num > max_allowed_sum:
            count += 1
            current_sum = num
            if count > k:
                return False
        else:
            current_sum += num

    return True

def split_array_largest_sum(nums, k):
    low = max(nums)       # minimum possible value for largest sum
    high = sum(nums)      # maximum possible value (1 partition)

    while low < high:
        mid = (low + high) // 2
        if can_split(nums, k, mid):
            high = mid
        else:
            low = mid + 1

    return low

nums = [7, 2, 5, 10, 8]
k = 2
print(split_array_largest_sum(nums, k))  # Output: 18

The above first version just calculates the minimum largest sum. In the next version we both calculate the sum and allocate the subarrays, which is a touch more useful:


#Binary search - split array largest sum with subarrays allocated
def split_array_with_subarrays(nums, k):
    def is_valid(threshold):
        current_sum = 0
        count = 1
        for num in nums:
            if current_sum + num > threshold:
                current_sum = num
                count += 1
                if count > k:
                    return False
            else:
                current_sum += num
        return True

    low, high = max(nums), sum(nums)
    best_sum = high

    while low <= high:
        mid = (low + high) // 2
        if is_valid(mid):
            best_sum = mid
            high = mid - 1
        else:
            low = mid + 1

    # Now rebuild the subarrays using the best_sum
    result = []
    current = []
    current_sum = 0
    remaining_parts = k

    for i, num in enumerate(nums):
        if current_sum + num > best_sum or len(nums) - i < remaining_parts:
            result.append(current)
            current = [num]
            current_sum = num
            remaining_parts -= 1
        else:
            current.append(num)
            current_sum += num
    result.append(current)  # last group

    return best_sum, result

nums = [7, 2, 5, 10, 8]
k = 2
max_sum, splits = split_array_with_subarrays(nums, k)

print(f"Minimum Max Subarray Sum: {max_sum}")
for i, part in enumerate(splits, 1):
    print(f"Subarray {i}: {part}")

#Ouput:
#Minimum Max Subarray Sum: 18
#Subarray 1: [7, 2, 5]
#Subarray 2: [10, 8]

Programming in Python

References

Programming in Python
Python Method & Function Overloading
Python Type Checking
Python List Comprehension
Python Dictionary Comprehension
Python Iterators
Python Generators
Python Closures
Python Maps and Filters
Python Refactoring With Generators
Python Classic Data Structures & Algorithms
Python Decorators