BishopPhillips Consulting - Python Iterators and Itertools

Python Techniques - Iterators

Creating & Using Iterators.

Author: Jonathan Bishop
AI Revolution

Iterators are a foundational concept in Python that enable lazy evaluation (computing values only as needed) and are central to loops, generators, and many built-in constructs. Closely related to iterators are generators which we look at in its own page. In this chapter we explore iterators and the advanced set of iterator tools provided in itertools. We will begin exploring this dimension of python as usual, with some definitions. I trust you will find our comprehensive guide to using and forming them of use.

1. What Are Iterators?

An iterator is an object in Python that allows us to traverse (or iterate) over the elements of a collection (like lists, dictionaries, or strings) one at a time. Iterators implement two special methods:

__iter__(): Returns the iterator object itself. This allows the object to be iterated.
__next__(): Returns the next element in the sequence. If there are no more elements, it raises the StopIteration exception.

An object that provides the __iter__() and __next__() methods is called an iterator.

2. How Iterators Work

When you use a loop like for or manually call next(), the iterator retrieves items one at a time from the underlying collection.

Example: Creating and Using an Iterator

# A simple list
my_list = [1, 2, 3]

# Get an iterator from the list
iterator = iter(my_list)

# Use the iterator manually
print(next(iterator))  # Output: 1
print(next(iterator))  # Output: 2
print(next(iterator))  # Output: 3
# Calling next() again raises StopIteration

3. Iterables vs. Iterators

Iterable: Any object that can return an iterator (e.g., lists, tuples, dictionaries, strings). It implements the __iter__() method.
Iterator: An object that provides items one at a time through the __next__() method.

All iterators are iterables, but not all iterables are iterators.

Example: Distinguishing Iterables and Iterators

my_list = [1, 2, 3]  # Iterable
iterator = iter(my_list)  # Iterator

# `my_list` is iterable but not an iterator
print(hasattr(my_list, '__iter__'))  # Output: True
print(hasattr(my_list, '__next__'))  # Output: False

# `iterator` is both iterable and an iterator
print(hasattr(iterator, '__iter__'))  # Output: True
print(hasattr(iterator, '__next__'))  # Output: True

4. Types of Iterators

Built-in Iterators

Some Python built-in functions return iterators:

range(): Generates numbers lazily.

for num in range(5):
    print(num)  # Output: 0, 1, 2, 3, 4

enumerate(): Iterates over items with their index.

for index, value in enumerate(['a', 'b', 'c']):
    print(index, value)  # Output: 0 a, 1 b, 2 c

zip(): Combines iterables into tuples.

for pair in zip([1, 2], ['a', 'b']):
    print(pair)  # Output: (1, 'a'), (2, 'b')

Custom Iterators

You can create your own iterator by defining a class with __iter__() and __next__() methods.

Simple Counter

Example:

class Counter:
    def __init__(self, start, end):
        self.current = start
        self.end = end

    def __iter__(self):
        return self  # Iterator object

    def __next__(self):
        if self.current > self.end:
            raise StopIteration
        self.current += 1
        return self.current - 1

counter = Counter(1, 5)
for num in counter:
    print(num)  # Output: 1, 2, 3, 4, 5

Simple Data Iterator


class CustomIterator:
    def __init__(self, data):
        self.data = data
        self.index = 0  # Start at the first element

    def __iter__(self):
        return self  # An iterator must return itself

    def __next__(self):
        if self.index >= len(self.data):
            raise StopIteration  # Stop when out of elements
        value = self.data[self.index]
        self.index += 1
        return value

# Manually iterating with a while loop
iterator = CustomIterator(["Apple", "Banana", "Cherry"])

while True:
    try:
        item = next(iterator)  # Get the next item
        print(item)
    except StopIteration:
        break  # Stop the loop when exhausted

The work, as usual, happens in the "next" method which checks to see if we have run out of elements in the list, raising a "StopIteration" event if true, and if not returns the next data item in the list. It needs to keep track of where we are up to in the list so it needs to manage and retain an index. While this is essentially just a one dimensional data structure (like a list) one can imagine that the iteration could be over a somewhat more complex data structure alomst as easily.

If the data list was, say, a tree that we were trying to traverse, the method would need to keep track of the node we were up to so that the next "next" call could resume from that node, which might require the use of a stack as well. Note, that when we briefly look at the "generator" alternative approach below, the tracking of this more complex scenario might be more easily achieved as the yield statement can break out of a recursive call sequence, while the generator is essentially reproducing a while loop.

As an aside, in the Generator Iterators section below I have reproduced this same algorithm as a generator and the Doubly Linked List module found in the Python section of this site demonstrates a more complex form of iterator that is able to be switched to go forward and backward over the list during iteration, as well as a generator.

Binary Tree Iterator

So lets look at the example of a simple binary tree iterator. This iterator will traverse the tree "in-order" (left, root, right) or "post-order" (left, right, root) depending on whether you intialise the BinaryTree class with True (infix) or False (postfix). Now since we have to do this non recursively, we will need to implement a stack to keep track of where we are at in the binary tree at each point. When traversing a tree structure using while loops (ie. non recursively) you, kind of, have to use a stack regardless, so the requirement for the stack is not a function of the iterator per-se, but is a consequence of the fact that the iterator is a simple looping protocol and can't (as far as I know) be built from a recursive method since it relies on returning the result at each call to "__next()".

We will work with a tree that looks like this:

       10
      /  \
     5    15
    / \    /\
   2   7 14  20


class Node:
    def __init__(self, value):
        self.value = value
        self.left = None
        self.right = None

class BinaryTreeInOrderIterator:
    def __init__(self, root):
        self.stack = []  # Stack for in-order traversal
        self._populate_stack(root)

    def _populate_stack(self, node):
        while node:
            self.stack.append(node)
            node = node.left  # Start from leftmost node

    def __iter__(self):
        return self

    def __next__(self):
        if not self.stack:
            raise StopIteration  # End iteration

        node = self.stack.pop()
        value = node.value

        if node.right:
            self._populate_stack(node.right)  # Add right subtree

        return value

class BinaryTreePostOrderIterator:
    def __init__(self, root):
        self.stack = []
        self._populate_stack(root)

    def _populate_stack(self, node):
        if node:
            self.stack.append((node, False))  # False means we haven't processed it yet

    def __iter__(self):
        return self

    def __next__(self):
        while self.stack:
            node, visited = self.stack.pop()
            
            if visited:  # If already visited, return the value
                return node.value
            
            # Mark node as visited and push it back onto the stack
            self.stack.append((node, True))

            # Push children (Left then Right) to process later
            self._populate_stack(node.right)
            self._populate_stack(node.left)

        raise StopIteration  # End iteration
    
class BinaryTree:
    def __init__(self, root, in_or_postfix=True):
        self.root = root
        self.in_or_postfix = in_or_postfix #infix if true
 
    def __iter__(self):
        if self.in_or_postfix:
            return BinaryTreeInOrderIterator(self.root)
        else:
            return BinaryTreePostOrderIterator(self.root)
        

# Creating the tree
root = Node(10)
root.left = Node(5)
root.right = Node(15)
root.left.left = Node(2)
root.left.right = Node(7)
root.right.left = Node(14)
root.right.right = Node(20)

# Iterating over the tree in order (infix)
print("In  order:",end=" ")
for value in BinaryTree(root, True):
    print(value, end=" ")
print()

print("Pst order:",end=" ")
# Iterating over the tree in post order (postfix)
for value in BinaryTree(root, False):
    print(value, end=" ")
print()

This will output the following output:


In  order: 2 5 7 10 14 15 20 
Pst order: 2 7 5 14 20 15 10

A point of interest in this code example is that we are using object composition (rather than inheritance) to get the iterator behaviour. That means that the "BinaryTree" class does not have to create its own iterator or inherit from a class that does but rather returns an iterator from another class in its __iter__() method. So BinaryTree does not have to directly implement the "__next__()" method, but does have to implement an "__iter__()" method that returns an iteration object, so BinaryTree is iterable, but not an iterator, but looks like it is to the outside world.

Let's look at how the simple in-order iterator works.

1. Initialising the Iterator

The class uses a stack (self.stack) to simulate recursion and keep track of nodes as it processes them.


class BinaryTreeIterator:
    def __init__(self, root):
        self.stack = []  # Stack stores nodes for traversal
        self._populate_stack(root)  # Start from the leftmost node

The _populate_stack() function pushes all left nodes onto the stack.
This ensures the first item returned is the smallest (leftmost).

2. Populating the Stack (_populate_stack)

def _populate_stack(self, node):
    while node:
        self.stack.append(node)  # Push node onto the stack
        node = node.left  # Move to the left child

The populate_stack() method simulates a recursive call by pushing the entire left hand side of the tree onto the stack from each starting node, so that the top most node is the last node pushed onto the stack. In the in-order traversal we just need the node, but in pre and post order traversal we also need a flag pushed with the node as a "(node, boolean)" tuple to remember whether we have handled the node. In this simple in-order traversal:

The while loop moves down to the leftmost node, stacking nodes along the way.
If the tree starts at 10, it pushes 10 → 5 → 2.

3. Implementing __next__()

def __next__(self):
    if not self.stack:
        raise StopIteration  # No more nodes to traverse

    node = self.stack.pop()  # Retrieve the next node
    value = node.value

    if node.right:
        self._populate_stack(node.right)  # Process right subtree next

    return value

The "__next__()" method does the work of traversing the tree:

First it checks whether the stack is empty - in which case we can stop iterating
Next it pops the top most node off the stack and set the return value to that node's value
Lastly it looks to see if we have a node on the right of the current (popped) node and if so, it repopulates the stack with the left hand side tree starting from that node
Lastly it returns the value of the node it popped off the stack earlier.

File Iterators

Files in Python are iterators:

with open('example.txt', 'r') as file:
    for line in file:
        print(line.strip())

Generator Iterators

Generators are iterators that produce values lazily. They’re created using functions with the yield keyword or generator expressions to provide a value to the calling routine (rather than the "return" approach of the standard iterator). The "yield" statement "suspends" execution of the method or function at that line and passes the identified value back to the calling routine, ready to resume execution from the next line when the calling routine issues a __next__() call. I have written a dedicated page on generators that you can find here (python generators), but you should read through this section regardless as the topic of generators is covered here as "generator iterators" which is slightly different from the coverage of them in that page as pure generators. The key aspect here is that the generator is being handed to the calling routine as if it was a standard iterator via the __iter__() method call of the class. ALso the examples we use here are designed to highlight the contrast of standard iterators and generator iterators.

Example using yield:

def count_up_to(n):
    current = 1
    while current <= n:
        yield current
        current += 1

gen = count_up_to(3)
try:
    print(next(gen))  # Output: 1
    print(next(gen))  # Output: 2
    print(next(gen))  # Output: 3
except StopIteraction:
    break

Although it is not needed here, it is good practice to wrap the "next" calls in a try/except block to catch the StopIteration event that would be generated by the generator if it runs out of results to generate.

As a point of contrast, let's look at the "simple data iterator code" from above using a generator instead of an iterator. Both function essentially the same way from the calling code's perspective.


def custom_generator(data):
    for item in data:
        yield item  # Yield each item one by one

# Using the generator
gen = custom_generator(["Apple", "Banana", "Cherry"])

while True:
    try:
        print(next(gen))  # Get next item from the generator
    except StopIteration:
        break  # Exit when exhausted

Note that the generator uses a "yield" statement to return the value to the calling routine. It acts as a kind of "wait here until being called again" and passes back the value listed after it and replaces the need for an explicitly defined "next" method. Note also that we do not need to explicitly raise the StopIteration event as that is raised automatically by Python when the function finishes after the last yield statement has been executed (and the next "next") has been called. Obviously, we still need to handle the StopIteration exception in the calling code if we are not using a statement form that automatically handles it (like a "for" loop).

By way of a further contrast with iterators, let's look at the binary tree traversal problem using a generator iterator instead of a standard iterator. We will use the same tree as the binary tree algorithm above, and the same basic structure of node - binary tree with a selector during initialisation to determine the traversal order - in or post order:


class Node:
    def __init__(self, value):
        self.value = value
        self.left = None
        self.right = None

def inorder_traversal(node):
    """Generator for in-order traversal (Left, Root, Right)."""
    if node:
        yield from inorder_traversal(node.left)  # Visit left subtree
        yield node.value  # Yield current node value
        yield from inorder_traversal(node.right)  # Visit right subtree

def postorder_traversal(node):
    """Generator for post-order traversal (Left, Right, Root)."""
    if node:
        yield from postorder_traversal(node.left)  # Visit left subtree
        yield from postorder_traversal(node.right)  # Visit right subtree
        yield node.value  # Yield current node value (root)

class BinaryTree:
    def __init__(self, root, in_or_postfix=True):
        self.root = root
        self.in_or_postfix = in_or_postfix #infix if true

    def __iter__(self):
        if self.in_or_postfix:
            return inorder_traversal(self.root) # Use the generator
        else:
            return postorder_traversal(self.root) # Use the generator

# Creating the tree
root = Node(10)
root.left = Node(5)
root.right = Node(15)
root.left.left = Node(2)
root.left.right = Node(7)
root.right.left = Node(14)
root.right.right = Node(20)

# Iterating over the tree in order (infix)
print("In  order:",end=" ")
for value in BinaryTree(root, True):
    print(value, end=" ")
print()

print("Pst order:",end=" ")
# Iterating over the tree in post order (postfix)
for value in BinaryTree(root, False):
    print(value, end=" ")
print()

This will output the following output:


In  order: 2 5 7 10 14 15 20 
Pst order: 2 7 5 14 20 15 10

It is clear here that the generator approach is significantly simpler and more readable than the conventional iterator approach, yet both look like iterators to the calling python code. The use of the "yield" statement allows us to use a recursive method call to traverse the tree, thus using the python call stack as the stack needed and not requiring us to make and maintain our own stack.

Most of the code should be clear from the standard iterator example above, but there is perhaps one point of interest to explore here. The "yield from inorder_traversal(node.left)" statement. The "yield from" construct is a shorthand for delegating iteration to a subgenerator or another iterable. When a function uses "yield from iterable", it:

Iterates over iterable and yields each item one by one.
Automatically handles delegation, meaning the outer generator does not need to manually loop through the inner iterable.
Passes exceptions and return values from the subgenerator to the caller.

When the left subtree has elements, "yield from" recursively calls inorder_traversal(node.left), yielding all elements from it before yielding the current node. After finishing this it then yields the current node's value.

An Interesting Aside To Note About Tree Traversal

Before we leave this topic, take a close look at the standard iterator version of the post-order traversal versus the in-order traversal & contrast the former with the recursive generator version. Notice something odd? (It's alright I'll wait.)

Did you notice that the standard iterator version of the in-order traversal pushed the entire left-hand tree onto the stack each time, and the generator versions of both the in-order and post-order methods did essentially the same? BUT not the standard iterator version of the post-order traversal. It just pushes the first left and right children onto the stack (well right than left so the left is the first popped off the stack). This works because the post-order version keeps a flag with the node that determines whether the node has been processed and if not, it pushes that node's children onto the stack when it next looks at them.

Summary of Differences

Traversal Type	Stack Management	Processing Order	Key Mechanism
In-Order (Iterator)	Left nodes pushed first (`while node:`)	Left → Root → Right	Explicit stack (push left first)
Post-Order (Iterator)	Nodes tracked with `(node, False)` flags	Left → Right → Root	Stack with visited flags
Recursive (Both)	Function calls handle traversal	(Same order as respective traversal)	Call stack manages traversal

5. Use Cases of Iterators

Lazy Evaluation

Iterators compute values on demand, making them ideal for handling large datasets or infinite sequences.


import itertools
for num in itertools.count(10):  # Infinite sequence starting at 10
    if num > 15:
        break
    print(num)

Data Processing Pipelines

Iterators work well for building efficient pipelines for data transformation.


data = (x**2 for x in range(10) if x % 2 == 0)  # Generator expression
print(list(data))  # Output: [0, 4, 16, 36, 64]

Memory Efficiency

Iterators are lightweight because they don’t store all data at once. For example, reading large files line-by-line:


with open('large_file.txt', 'r') as file:
    for line in file:
        process(line)  # Process each line

6. Advanced Iterator Types from `itertools`

The itertools module provides tools for creating advanced iterators:

Infinite Iterators:
- count(), cycle(), repeat()
Combinatoric Iterators:
- permutations(), combinations(), product()
Terminating Iterators:
- chain(), compress(), dropwhile(), takewhile()

7. Common Pitfalls with Iterators

Single Pass: Iterators can only be traversed once.


data = iter([1, 2, 3])
print(list(data))  # Output: [1, 2, 3]
print(list(data))  # Output: [] (Already exhausted)

StopIteration: Calling next() beyond the last item raises StopIteration. Handle it carefully:
```
try:
    print(next(data))
except StopIteration:
    print("No more items.")
```

Summary

Type	Examples	Key Features
Built-in Iterators	`range()`, `enumerate()`, `zip()`	Convenient and memory-efficient.
Custom Iterators	Classes with `__iter__()` and `__next__()`	Define custom iteration logic.
File Iterators	Reading files line-by-line	Useful for large files.
Generators	Functions with `yield` or generator expressions	Lazy, memory-efficient, and easy to use.
Advanced Iterators	`itertools` (`count()`, `chain()`)	Combines power and flexibility for complex data processing.

Itertools

itertools is a Python module in the standard library that provides a collection of fast, memory-efficient tools for creating and working with iterators. These tools are inspired by constructs from functional programming languages like Haskell and SML. The module is particularly useful for handling large datasets, creating combinatorial constructs, and building efficient looping mechanisms.

Key Features of `itertools`

Infinite Iterators:

These iterators generate an infinite sequence of values, which can be useful for tasks like generating numbers or cycling through a sequence indefinitely.

Examples:


from itertools import count, cycle, repeat

# count(): Infinite sequence of numbers
for i in count(10, 2):  # Start at 10, step by 2
    print(i)  # Output: 10, 12, 14, ...

# cycle(): Repeats elements of an iterable indefinitely
for char in cycle("AB"):
    print(char)  # Output: A, B, A, B, ...

# repeat(): Repeats a single value
for val in repeat(5, 3):  # Repeat 5 three times
    print(val)  # Output: 5, 5, 5

Iterators Terminating on Input:

These iterators process finite sequences and terminate when the input is exhausted.

Examples:


from itertools import accumulate, chain, compress, dropwhile, takewhile

# accumulate(): Running totals
print(list(accumulate([1, 2, 3, 4])))  # Output: [1, 3, 6, 10]

# chain(): Combines multiple iterables
print(list(chain([1, 2], [3, 4])))  # Output: [1, 2, 3, 4]

# compress(): Filters elements based on a selector
print(list(compress('ABCDEF', [1, 0, 1, 0, 1, 1])))  # Output: ['A', 'C', 'E', 'F']

# dropwhile(): Drops elements while a condition is true
print(list(dropwhile(lambda x: x < 3, [1, 2, 3, 4])))  # Output: [3, 4]

# takewhile(): Takes elements while a condition is true
print(list(takewhile(lambda x: x < 3, [1, 2, 3, 4])))  # Output: [1, 2]

Combinatoric Iterators:

These iterators generate combinations, permutations, and Cartesian products.

Examples:


from itertools import product, permutations, combinations, combinations_with_replacement

# product(): Cartesian product
print(list(product('AB', repeat=2)))  # Output: [('A', 'A'), ('A', 'B'), ('B', 'A'), ('B', 'B')]

# permutations(): All possible orderings
print(list(permutations('ABC', 2)))  # Output: [('A', 'B'), ('A', 'C'), ('B', 'A'), ...]

# combinations(): Unique combinations
print(list(combinations('ABC', 2)))  # Output: [('A', 'B'), ('A', 'C'), ('B', 'C')]

# combinations_with_replacement(): Combinations with repetition
print(list(combinations_with_replacement('ABC', 2)))  # Output: [('A', 'A'), ('A', 'B'), ...]

Utility Functions:

These functions help manipulate iterators in various ways.

Examples:


from itertools import islice, groupby, starmap

# islice(): Slices an iterator
print(list(islice(range(10), 2, 8, 2)))  # Output: [2, 4, 6]

# groupby(): Groups elements based on a key function
data = [("A", 1), ("A", 2), ("B", 3)]
for key, group in groupby(data, lambda x: x[0]):
    print(key, list(group))  # Output: A [(A, 1), (A, 2)], B [(B, 3)]

# starmap(): Applies a function to unpacked arguments
from math import pow
print(list(starmap(pow, [(2, 3), (3, 2)])))  # Output: [8, 9]

Why Use `itertools`?

Efficiency: The tools are implemented in C, making them faster than equivalent Python code.
Memory Optimization: Iterators are lazy, meaning they generate values on demand without storing them in memory.
Composability: Functions can be combined to create complex workflows.

Practical Applications

Data Processing Pipelines:
- Combine chain, compress, and groupby to process large datasets efficiently.
Mathematical Computations:
- Use accumulate for cumulative sums or products.
Combinatorial Problems:
- Solve problems involving permutations, combinations, or Cartesian products.
Custom Iterators:
- Build custom iterators by chaining and filtering existing ones.

itertools is a treasure trove for Python developers working with iterators and data streams. It simplifies complex tasks, improves performance, and reduces memory usage.

Programming in Python

References

Programming in Python
Python Method & Function Overloading
Python Type Checking
Python List Comprehension
Python Dictionary Comprehension
Python Iterators
Python Generators
Python Closures
Python Maps and Filters
Python Refactoring With Generators
Python Decorators

Python Techniques - Iterators

Creating & Using Iterators.

1. What Are Iterators?

2. How Iterators Work

Example: Creating and Using an Iterator

3. Iterables vs. Iterators

Example: Distinguishing Iterables and Iterators

4. Types of Iterators

Built-in Iterators

Custom Iterators

Simple Counter

Simple Data Iterator

Binary Tree Iterator

File Iterators

Generator Iterators

Summary of Differences

5. Use Cases of Iterators

Lazy Evaluation

Data Processing Pipelines

Memory Efficiency

6. Advanced Iterator Types from itertools

7. Common Pitfalls with Iterators

Summary

Itertools

Key Features of itertools

Why Use itertools?

Practical Applications

Programming in Python

References

6. Advanced Iterator Types from `itertools`

Key Features of `itertools`

Why Use `itertools`?