

Python Programming
The mastery of algorithms and data structures is a fundamental skill of the software engineer. While we tend to use the built in implementations in many languages, there are often times when specific solutions are not implemented in standard libraries. Even when they are understanding how a classic algorithm works is critical to debugging behaviour or choosing the right or best approach.
This paper explores are group of classic algorithms and datastructures and provides working implementations that can be used as a pattern to make more powerful or complex versions.
"Two pointers" is an algorithm for solving problems involving arrays or lists by utilising two pointers to traverse the data structure. It aims to achieve linear-time or O(1) solutions for problems that might otherwise seem quadratic. It involves moving left & right pointers to process or compare elements over a sorted array from both ends.
It is commonly used for solutions involving:
Two-pointer solutions typically involve:
Key markers to look for to recognise the algorithm in code:
| Marker | Description | Example in Action |
|---|---|---|
| left, right pointers | Variables initialized at opposite ends of an array | Reversing, partitioning |
| Movement toward each other | Pointers meet in the middle based on conditions | Palindrome checks, target sum |
| One slow, one fast pointer | Useful for detection (e.g. cycle in linked list) | Floyd’s tortoise and hare |
| Two iterators over sorted arrays | Merging, difference comparisons | Merge sorted arrays, union/intersection |
| read / write pointer pair | One scans data, the other rebuilds result | Remove duplicates, string compression |
Mental cues that suggest “Two-Pointers” might work
You can think of the two pointers algorithm as a coordinated dance: one pointer may scan while the other may write, or both may move towards a central rendezvous, or move at different speeds to detect patterns.
Two Pointer: Efficient O(n) search after sorting
Given an array of numbers and a target, we can find the first pair that matches the target as follows:
def find_pair_with_sum(nums, target):
nums.sort() # Ensure the list is sorted
left, right = 0, len(nums) - 1
while left < right:
s = nums[left] + nums[right]
if s == target:
return (nums[left], nums[right])
elif s < target:
left += 1
else:
right -= 1
return None
print(find_pair_with_sum([1, 2, 2, 3, 4, 4, 5], 6))
#output: (1,5)
If we want to find all matching pairs we can rewrite this algorithm as a generator:
def generate_pairs_with_sum(nums, target):
nums.sort()
left, right = 0, len(nums) - 1
while left < right:
s = nums[left] + nums[right]
if s == target:
yield (nums[left], nums[right])
# Skip duplicates
left_val, right_val = nums[left], nums[right]
while left < right and nums[left] == left_val:
left += 1
while left < right and nums[right] == right_val:
right -= 1
elif s < target:
left += 1
else:
right -= 1
for pair in generate_pairs_with_sum([1, 2, 2, 3, 4, 4, 5], 6):
print(pair)
#output:
#(1, 5)
#(2, 4)
If we want to reverse an array or list "in place" (meaning: without duplicating the list) in O(1) time and O(1) space, we would use a two pointer solution:
def reverse_in_place(arr):
left, right = 0, len(arr) - 1
while left < right:
arr[left], arr[right] = arr[right], arr[left]
left += 1
right -= 1
arr1 = [1, 2, 2, 3, 4, 4, 5]
reverse_in_place(arr1)
print(arr1)
#output: [5,4,4,3,2,2,1]
As a contrast we note the non-inplace array slicing alternative:
#Array slicing - not in place, not two pointer
arr1 = [1, 2, 2, 3, 4, 4, 5]
print(arr1[::-1])
#output: [5,4,4,3,2,2,1]
If we want to remove duplicates from a sorted array or list "in place" (meaning: without duplicating the list) in O(1) time and O(1) space, we would use a two pointer solution where one pointer reads and the other writes:
def remove_duplicates(nums):
if not nums:
return 0
write = 1
for read in range(1, len(nums)):
if nums[read] != nums[read - 1]:
nums[write] = nums[read]
write += 1
return write # new length or just return nums[:write] for the array slice
arr1 = [1, 2, 2, 3, 4, 4, 5]
print(arr1[:remove_duplicates(arr1)])
#output: [1,2,3,4,5]
While it is tempting to try and do this as a list comprehension, that won't work as an "in place" solution because the comprehension requires the source to be immutable. A solution with copying is doable, but that is not an "in place" nor "two pointer" deduplication:
#List Comprehension: NOT in place, NOT two pointer
def remove_duplicates_new(nums):
return [nums[i] for i in range(len(nums)) if i == 0 or nums[i] != nums[i - 1]]
arr1 = [1, 2, 2, 3, 4, 4, 5]
print(remove_duplicates_new(arr1))
#output: [1,2,3,4,5]
If we want to merge two sorted arrays or lists a two pointer solution works nicely. This solution could be adapted to work "in place" (meaning: without duplicating the list) in O(1) time and O(1) space if one of the buffers has space. Here we will copy the merged array into a new array:
def merge_sorted_arrays(A, B):
i, j = 0, 0
merged = []
while i < len(A) and j < len(B):
if A[i] < B[j]:
merged.append(A[i])
i += 1
else:
merged.append(B[j])
j += 1
merged.extend(A[i:])
merged.extend(B[j:])
return merged
arr1 = [1, 4, 4, 5, 8]
arr2 = [2, 3, 4, 4, 5,6,7]
print(merge_sorted_arrays(arr1,arr2))
#output: [1, 2, 3, 4, 4, 4, 4, 5, 5, 6, 7, 8]
As you can see the above solution retains duplicates, which may often be the solution you require, but now let's assume we wanted to merge two sorted arrays and remove duplicates:
def merge_sorted_unique(A, B):
i, j = 0, 0
merged = []
prev = None # Tracks the last value appended
while i < len(A) and j < len(B):
if A[i] < B[j]:
val = A[i]
i += 1
elif A[i] > B[j]:
val = B[j]
j += 1
else:
val = A[i] # Equal case—only add once
i += 1
j += 1
if val != prev:
merged.append(val)
prev = val
# Handle remaining elements in A
while i < len(A):
if A[i] != prev:
merged.append(A[i])
prev = A[i]
i += 1
# Handle remaining elements in B
while j < len(B):
if B[j] != prev:
merged.append(B[j])
prev = B[j]
j += 1
return merged
arr1 = [1, 4, 4, 5, 8]
arr2 = [2, 3, 4, 4, 5,6,7]
print(merge_sorted_unique(arr1,arr2))
#output: [1, 2, 3, 4, 5, 6, 7, 8]
Note that in the non-duplicate version we can't just extend the arrays by the length of the remaining elements like we did in the first duplicating solution as we need to check each array to eliminate duplicates
On a last note here, as we are not working "in place", we could rewrite the last solution as a list comprehension. Unfortunately, while the two pointer solution was O(n + m), the comprehension solution would be O(n + m) log(n + m) because of the merging sort at the start. And no, there is no way that I can think of to write a comprehension that traverses the two arrays simultaneously eliminating the pre-sort:
#Merge two sorted arrays as a comprehension:
def merge_sorted_unique_comprehension(A, B):
merged = sorted(A + B)
return [
merged[i]
for i in range(len(merged))
if i == 0 or merged[i] != merged[i - 1]
]
arr1 = [1, 4, 4, 5, 8]
arr2 = [2, 3, 4, 4, 5,6,7]
print(merge_sorted_unique_comprehension(arr1,arr2))
#output: [1, 2, 3, 4, 5, 6, 7, 8]
It would be possible to turn the two pointer solution into a generator and then build the list by iterating the generator, but there gain here would be lazy eval and memory saving rather than code complexity reduction, and is perhaps pointless if you are just generating a list as in this example, but where the items are being consumed in merged order other than just making a list the generator solution would have merit:
#Two pointer merge of two sorted arrays as a generator:
def merge_unique_twopointer_gen(A, B):
i, j = 0, 0
prev = None
while i < len(A) and j < len(B):
if A[i] < B[j]:
val = A[i]
i += 1
elif A[i] > B[j]:
val = B[j]
j += 1
else:
val = A[i]
i += 1
j += 1
if val != prev:
yield val
prev = val
while i < len(A):
if A[i] != prev:
yield A[i]
prev = A[i]
i += 1
while j < len(B):
if B[j] != prev:
yield B[j]
prev = B[j]
j += 1
arr1 = [1, 4, 4, 5, 8]
arr2 = [2, 3, 4, 4, 5,6,7]
print([x for x in merge_unique_twopointer_gen(arr1, arr2)])
#output: [1, 2, 3, 4, 5, 6, 7, 8]
As a very last comment on this problem, it is worth noting that Python offers other ways of approaching this problem, which are also not two pointer solutions, but very code minimising. One such is using itertools.groupby & heapq.merge(), which also deliver a list comprehension single line solution and thus also a generator solution (see my articles on comprehensions or generators to understand why).
from heapq import merge
from itertools import groupby
def merge_sorted_unique_heapq_itertool(A, B):
return [key for key, _ in groupby(merge(A, B))]
arr1 = [1, 4, 4, 5, 8]
arr2 = [2, 3, 4, 4, 5,6,7]
print(merge_sorted_unique_heapq_itertool(arr1, arr2))
#output: [1, 2, 3, 4, 5, 6, 7, 8]
In this example we are going to partition an array "in place", putting the even numbers on the left and the odd numbers on the right. Obviously the predicate (rule) for the partition could be anything that distinguishes the members. We have just chosen evens and odds for convenience.
#Partition array into evens and odds
def partition_array(nums):
left, right = 0, len(nums) - 1
while left < right:
if nums[left] % 2 == 0:
left += 1
elif nums[right] % 2 == 1:
right -= 1
else:
nums[left], nums[right] = nums[right], nums[left]
return nums
arr1 = [1, 2, 3, 4, 5, 6, 7, 8]
print(partition_array(arr1))
#output: [8, 2, 6, 4, 5, 3, 7, 1]
Now if we wanted to generalise this solution to accept a predicate passed in it would look like this:
#Partition array based on generalised predicate
def partition_array(nums, predicate):
left, right = 0, len(nums) - 1
while left < right:
if predicate(nums[left]):
left += 1
elif not predicate(nums[right]):
right -= 1
else:
nums[left], nums[right] = nums[right], nums[left]
return nums
arr1 = [1, 2, 3, 4, 5, 6, 7, 8]
print(partition_array(arr1, lambda x: x % 2 == 0))
# Output (order not guaranteed): [8, 2, 6, 4, 5, 3, 7, 1]
By way of example we could use other predicates:
lambda x: x < 0 #negatives first
lambda x: isinstance(x, str) #strings before everything else
lambda x: x in primes #primes first
The two pointer solution to the partitioning problem is "not stable" in that it does not preserve the orderbut does work "in place" and O(n). If we are willing to trade away the "in place" and accept O(2n) component we can achieve a stable result without a two pointer solution using a list comprehension that returns a new array:
#List comprehension solution
def stable_partition(nums, predicate):
return [x for x in nums if predicate(x)] + [x for x in nums if not predicate(x)]
arr1 = [1, 2, 3, 4, 5, 6, 7, 8]
print(partition_array(arr1, lambda x: x % 2 == 0))
# Output (order guaranteed): [2, 4, 6, 8, 1, 3, 5, 7]
In this specific case (of odds/evens partition) it would also be possible to generate a new buffer equal in size to the input buffer and create the partition in O(n) time while preserving order, but that solution would not be generalisable as it relies on knowing the size of the left partition ahead of time - which only works for the specific example input & predicate we are using here, so we won't illustrate that now.
Binary search is a simple algorithm for searching a sorted array or list. It works in O(log n) time and works by successively halving the search space to find a target or optimum value. It is much faster than a brute force sequential search that works in O(n) time.
The classic use case is for problems that require:
This is the classic iterative binary search. It returns the index of the target or -1 if not found. The basic algorithm is to halve the search space each iteration and determine whether the target is above or below the halfway point, and then halve that space and determine whether the target is above or below the new halfway point, and continue until the target is located or proven to be missing.
#Iterative binary search
def binary_search_iter(nums, target):
left, right = 0, len(nums) - 1
while left <= right:
mid = (left + right) // 2 # Avoid (left + right) overflow in other languages
if nums[mid] == target:
return mid
elif nums[mid] < target:
left = mid + 1
else:
right = mid - 1
return -1 # Not found
arr1 = [1, 2, 3, 4, 5, 6, 8, 9, 10]
print(binary_search_iter(arr1, 6))
print(binary_search_iter(arr1, 7))
# Output: 5 -1
The algorithm of the recursive version is essentially the same as the iterative version except that it recurses. It effectively creates a search tree in the stack, but while recursions are elegant note that Python has a recursion limit (as do all languages) and no tail call optimization—so iterative is safer for large arrays.
#Recursive binary search
def binary_search_recursive(nums, target, left=0, right=None):
if right is None:
right = len(nums) - 1
if left > right:
return -1
mid = (left + right) // 2
if nums[mid] == target:
return mid
elif nums[mid] < target:
return binary_search_recursive(nums, target, mid + 1, right)
else:
return binary_search_recursive(nums, target, left, mid - 1)
arr1 = [1, 2, 3, 4, 5, 6, 8, 9, 10]
print(binary_search_recursive(arr1, 6))
print(binary_search_recursive(arr1, 7))
# Output: 5 -1
This version of the iterative binary seacrh is designed to determine the first index where nums[i]>=target, and is classically relevant to determining the correct insertion point for a value, where order must be preserved. Again we are returning the index into the array:
#Iterative binary lower bound search
def lower_bound(nums, target):
left, right = 0, len(nums)
while left < right:
mid = (left + right) // 2
if nums[mid] < target:
left = mid + 1
else:
right = mid
return left
arr1 = [1, 2, 3, 4, 5, 6, 8, 9, 10]
print(lower_bound(arr1, 6))
print(lower_bound(arr1, 7))
# Output: 5 6
This version of the iterative binary seacrh is designed to determine the first index where nums[i]==target, but it works on a rotated array. A rotated array is a sorted array that has been “rotated” around a pivot point—so part of it has been shifted to the other end while still preserving the original ordering of elements relative to each other. Rotated arrays appear in circular (real-time) buffers, shifted datasets (like "year rollover").
#Iterative binary rotated array search
def is_rotated(arr):
return any(arr[i] > arr[i+1] for i in range(len(arr)-1))
def search_rotated(nums, target):
left, right = 0, len(nums) - 1
while left <= right:
mid = (left + right) // 2
if nums[mid] == target:
return mid
if nums[left] <= nums[mid]:
if nums[left] <= target < nums[mid]:
right = mid - 1
else:
left = mid + 1
else:
if nums[mid] < target <= nums[right]:
left = mid + 1
else:
right = mid - 1
return -1
arr1 = [6, 8, 9, 10, 1, 2, 3, 4, 5, ]
if is_rotated(arr1):
print(search_rotated(arr1, 3))
print(search_rotated(arr1, 7))
# Output: 6 -1
In this scenario the middle point is unknown at the start of the algorithm, so the first objective is to find the mid point that is relavant to the target. We start with testing the mid as halfway between the start and end of the array and check whether our target is at the mid point. If no, we check whether the leftmost number is less than the mid number. If yes (we'll call this scenario 1) then this is the normal scenario and we then test whether the target is between the left and middle, if yes then we have found the range to search and the mid becomes the right edge, if no then it is the other range and the left becomes the mid point. If the answer to the scenario 1 test was no, (we'll call this scenario 2) then we assume the range we are after is the higher range above the mid point and check to see if the target is between the middle and the right edge. If yes we have found the range to search and set the left to the mid, but if no we set the right to the mid. We repeat this process until the mid equals the target, or the left and right equal or cross eachother.
The is_rotated(arr1) function is not actually required. It is just there to show a scenario where you want to test whether the array is rotated before using the rotated binary search routine.
Binary search doesn’t have to stay discrete, nor actually work on an array! In this example we use a binary search algorithm to find the square root iof a number. To work the binary search just needs the members being searched to be in an actual or implied ascending (or descending) order.
#Iterative binary square root non-array search
def sqrt_binary(n, eps=1e-6):
left, right = 0, n
while right - left > eps:
mid = (left + right) / 2
if mid * mid < n:
left = mid
else:
right = mid
return round((left + right) / 2, 6)
print(sqrt_binary(25))
# Output: 5
The routine will return a decimal correct to 6 decimal places as the target of the square number. It runs until the difference between the left and right positions is less than the smallest number greater than 0 that is able to be represented, or to whatever you set eps. With each iteration it calculates the square of the mid point in the current range, and moves the range based on whether that square is less than or greater than n. If the square is less than we move the range to so left becomes mid and right remains the same, else we set the range to the previous left and make right the last mid. With each iteration the range becomes smaller until the is an immeasurable difference between the left and the right, where immeasurability is determined as the range being smaller then eps.
Here we have anpther binary search that doesn't need work with an array! In this example we use a binary search algorithm to find the optimal placement of something given a set of conditions that must be satisfied. For example, we have a function "can_I_do_this()" that returns True or False if a set of conditions are satisfied, and we want to find the smallest value of x between a lo and a hi value that satifies those conditions. We can use a binary search to find this x. Here is the general pattern:
#Iterative binary minimize maximum binary search - non array
#PATTERN - See below for application
def can_I_do_this( testme)
return True
def minimize_threshold(can_do, lo, hi):
while lo < hi:
mid = (lo + hi) // 2
if can_do(mid):
hi = mid
else:
lo = mid + 1
return lo
print(minimize_threshold(can_I_do_this, lo, hi))
So lets look at some examples of this pattern in use.
A classic problem is known as "aggressive cows": Assume an array of stalls, some of which are occupied, some not. We are given a set of k aggressive cows that don't like each other plus a list of available stalls and we have to find the optimum placement of the cows to maximise the minimum distance between them in the remaining available stalls. The problem requires that we report the minimum maximised distance. So the question is "What’s the largest value of minimum distance I can guarantee between cows if I place them greedily?" For fun we will add that we also want the stall positions as a second solution.
While this problem is presented in a humorous context, it models an issue that comes up in real world problems such as antenna placement, server allocation or warehouse distribution, etc.
#Aggressive cows solution - 2 solutions, one with min max distance & two distance & placement
#Test the criteria - check we can place k cows in the stalls separated by min_dist
#Whilever we can place the cows, the calling routine will try and increase the distance and test again
def can_place_cows(stalls, k, min_dist):
count = 1
last_pos = stalls[0]
for pos in stalls[1:]:
if pos - last_pos >= min_dist:
count += 1
last_pos = pos
if count == k:
return True
return False
#Solution 1. Calculate the optimum distance
#Commencing by taking half the full range, we test the criteria and if successful
#we try with try a larger separation by taking the halway point of the upper range
#else we take the halfway point of the lower range until the low and high collide.
def aggressive_cows(stalls, k):
stalls.sort()
low = 0
high = stalls[-1] - stalls[0]
result = 0
while low <= high:
mid = (low + high) // 2
if can_place_cows(stalls, k, mid):
result = mid # Try a larger minimum distance
low = mid + 1
else:
high = mid - 1
return result
# Load and test the cow placement in the stalls
def get_cow_placements(stalls, k, min_dist):
placements = [stalls[0]]
last_pos = stalls[0]
for pos in stalls[1:]:
if pos - last_pos >= min_dist:
placements.append(pos)
last_pos = pos
if len(placements) == k:
break
return placements if len(placements) == k else []
#Or Solution 2. Do it all in one step:
def aggressive_cows_with_placement(stalls, k):
stalls.sort()
low, high = 0, stalls[-1] - stalls[0]
best_dist = 0
while low <= high:
mid = (low + high) // 2
if get_cow_placements(stalls, k, mid):
best_dist = mid
low = mid + 1
else:
high = mid - 1
return (best_dist, get_cow_placements(stalls, k, best_dist))
stalls = [1, 2, 4, 8, 9, 11, 14, 15]
k = 5
min_dist = 0
print(f"Sltn 1. Min distance between cows: {(min_dist := aggressive_cows(stalls, k))}") # Output: 3
print(f"Sltn 1. Stalls: {get_cow_placements(stalls, k, min_dist)}") # Output: [1, 4, 8, 11, 14]
print(f"Sltn 2. Separation & Stalls: {aggressive_cows_with_placement(stalls, k)}") # Output: (3, [1, 4, 8, 11, 14])
The solution works by commencing by taking half the full range of available stalls. Using this as the separation we test the criteria by attempting to place k cows into the stalls separated by our first guess and commencing with the first available stall. If successful we try with try a larger separation by taking the halway point of the upper range else we take the halfway point of the lower range and test again. We repeat this cycle until the low and high of the range collide.
This solution demonstrates "greedy validation" - place the cows in the earliest valid stall then check if it holds true for k cows and demonstrates using a binary search on the answer when searching for a feasible value rather than an item in a list, and lastly optimisation under constraints.
As a last comment, I will note why the cow placement for-loop (which generates a list) can not be turned into a comprehension. Comprehensions aren't designed to carry forward evolving state across iterations, and here each placement depends on the last placed position with the last_pos being updated conditionally and finally we need to short-circuit the entire thing once k cows are placed. So, sorry, no list comprehension is possible here as far as I can see. Comprehension are not designed to be used for bounded, state-dependent sequence with breaking conditions.
In the minimum maximum page problem we have k students and an array of pages where each element represents the number of pages in a book. The goal is to assign contiguous books to each student such that the maximum number of pages assigned to any student is minimised. As a constraint a book can not be broken up, so the minimum number of pages allocated is the maximum book.
In this problem we are not searching through the array, we are searching for the smallest value of the maximum pages such that the allocation is possible and as balanced as can be achieved. This is not you standard binary search on data, but a binary search on the value space. It demonstrates both binary search and greedy validation.
In the real world we would use this algorithm for allocating workloads across multiple processors or teams, so again while presented as a hypothetical scenario here the algorithm has real world application.
As before we will deliver this solution in two parts, one that calculates the workload, and the other that both calculates the workload and distributes the pages. Obviously in a work-balancing scenario this latter version is more useful.
#Test whether the page allocation is feasible by verifying that with the current max pages allowed
#we do not require a number of students that exceeds the number of students available
def is_feasible(pages, k, max_allowed):
count = 1 # students
current_sum = 0
#Step through all the books of pages, allocating to them students
for page in pages:
if page > max_allowed: #Test that no book is larger than our current max pages
return False #one book too big
#Test that the current number of pages for the current student is below the max allowed
if current_sum + page > max_allowed:
count += 1 #Move to the next student
current_sum = page
if count > k: # Check we haven't run out of students
return False
else:
current_sum += page # Add another book to the current student
return True
#Use a binary search to find the minimum number of pages that can be allocated
#while not breaching the constraint of each book being indivisible and all books allocated across as many
# students as possible. The range we are working with at the start is from one book per student (which
# therefore means the largest book is the lowest maximum pages per student) to a maximum of all books
# to one student. This is our starting search range.
# As always we begin by finding the mid-point between these two extremes and testing that, deciding
# whether the increase or decrease the number by 50% based on success or failure of the mid point each
# time, and moving the low and high closer together with each iteration until they collide.
# Remember we are trying to minimise the individual workload so we are searching for the minimum
# allocation for each student.
def allocate_books(pages, k):
if len(pages) < k:
return -1 # Not enough books
low = max(pages) # At least one book per student
high = sum(pages) # One student does all
result = high
while low <= high:
mid = (low + high) // 2
if is_feasible(pages, k, mid):
result = mid # Remember we are trying to MINIMISE the work load
high = mid - 1 # Try better (smaller max)
else:
low = mid + 1 # Too small, need more allowance
return result
#This is the same as the previous solution only now we are recording the actual allocation of
#books per student. Note we have defined an internal modified version of the feasibility test
#which remembers the allocation.
def allocate_books_with_assignment(pages, k):
if len(pages) < k:
return -1, [] # Not enough books
def is_feasible(pages, k, max_pages):
alloc = []
current = []
total = 0
for page in pages:
if page > max_pages:
return False, []
if total + page > max_pages:
alloc.append(current)
current = [page]
total = page
k -= 1
if k == 0:
return False, []
else:
current.append(page)
total += page
alloc.append(current)
return True, alloc
low, high = max(pages), sum(pages)
best_allocation = []
result = high
while low <= high:
mid = (low + high) // 2
feasible, allocation = is_feasible(pages, k, mid)
if feasible:
result = mid
best_allocation = allocation
high = mid - 1
else:
low = mid + 1
return result, best_allocation
pages = [10, 20, 30, 40]
k = 2
print(allocate_books(pages, k)) #Output: 60
pages = [10, 20, 30, 40, 50, 60]
k = 3
max_pages, allocation = allocate_books_with_assignment(pages, k)
print(f"Minimum Max Pages: {max_pages}")
print("Book assignments per student:")
for i, group in enumerate(allocation, 1):
print(f"Student {i}: {group}")
#Output:
#Min_Max Pps: 90
#S1 [10, 20, 30]
#S2 [40, 50]
#S3 [60]
We are trying to minimise the individual workload for each student so we are searching for the minimum page allocation for each student. If we find a feasible solution we would aim to find a better solution with a smaller maximum number of pages per student. We use a binary search to find this minimum number of pages that can be allocated while not breaching the constraint of each book being indivisible and all books allocated across as many students as possible.
The range we are working with at the start is from one book per student (which therefore means the largest book is the lowest maximum pages per student) to a maximum of all books to one student. This is our starting search range. As always we begin by finding the mid-point between these two extremes and testing that, deciding whether to increase or decrease the number by 50% based on success or failure of the mid point each time, and moving the low and high closer together with each iteration until they collide. If an allocation is feasible, we try again by decreasing the number of pages allocated, until we need more students than we have available in which case that max pages will fail the feasibility test.
Our last binary search variant is the "split array largest sum" problem. This may be the most interesting of the variants as it has a very wide potential use. We will look at those real world uses in a second, but first let's describe the problem
We are given an array of non negative numbers we will call "nums" and the number of sub-arrays we want to create we will call "k". Our challenge is to split the array nums into k contiguous subarrays such theh the largest subarray sum is minimised.
We therefore need to return the minimum possible value of the largest sum across all possible valid splits.
This pattern applies in the real-world when slicing ordered, weight bearing tasks into parts and we care about load fairness or completion time:
#Binary search - split array largest sum
def can_split(nums, k, max_allowed_sum):
count = 1
current_sum = 0
for num in nums:
if current_sum + num > max_allowed_sum:
count += 1
current_sum = num
if count > k:
return False
else:
current_sum += num
return True
def split_array_largest_sum(nums, k):
low = max(nums) # minimum possible value for largest sum
high = sum(nums) # maximum possible value (1 partition)
while low < high:
mid = (low + high) // 2
if can_split(nums, k, mid):
high = mid
else:
low = mid + 1
return low
nums = [7, 2, 5, 10, 8]
k = 2
print(split_array_largest_sum(nums, k)) # Output: 18
The above first version just calculates the minimum largest sum. In the next version we both calculate the sum and allocate the subarrays, which is a touch more useful:
#Binary search - split array largest sum with subarrays allocated
def split_array_with_subarrays(nums, k):
def is_valid(threshold):
current_sum = 0
count = 1
for num in nums:
if current_sum + num > threshold:
current_sum = num
count += 1
if count > k:
return False
else:
current_sum += num
return True
low, high = max(nums), sum(nums)
best_sum = high
while low <= high:
mid = (low + high) // 2
if is_valid(mid):
best_sum = mid
high = mid - 1
else:
low = mid + 1
# Now rebuild the subarrays using the best_sum
result = []
current = []
current_sum = 0
remaining_parts = k
for i, num in enumerate(nums):
if current_sum + num > best_sum or len(nums) - i < remaining_parts:
result.append(current)
current = [num]
current_sum = num
remaining_parts -= 1
else:
current.append(num)
current_sum += num
result.append(current) # last group
return best_sum, result
nums = [7, 2, 5, 10, 8]
k = 2
max_sum, splits = split_array_with_subarrays(nums, k)
print(f"Minimum Max Subarray Sum: {max_sum}")
for i, part in enumerate(splits, 1):
print(f"Subarray {i}: {part}")
#Ouput:
#Minimum Max Subarray Sum: 18
#Subarray 1: [7, 2, 5]
#Subarray 2: [10, 8]
Programming in Python
Python Method & Function Overloading
Python Type Checking
Python List Comprehension
Python Dictionary Comprehension
Python Iterators
Python Generators
Python Closures
Python Maps and Filters
Python Refactoring With Generators
Python Classic Data Structures & Algorithms
Python Decorators