BishopPhillips Consulting - Python Disctionary Comprehension and Generators

Python Techniques - Dictionary Comprehensions

Creating Dictionary Comprehensions.

Author: Jonathan Bishop
AI Revolution

Like List Comprehensions, Dictionary Comprehensions are a powerful, concise, and expressive way to create or transform dictionaries in Python. They allow you to write cleaner and more efficient code compared to traditional looping and dictionary manipulation methods. As noted in my article on list comprehensions, comprehension strategies can be applied to dictionaries and sets. This article is a follow on from the list comprehension paper and if you haven't read that one yet you would be well advised to do so now. This paper builds on the ideas covered in that paper and the fundamental mechansims for reading and interpreting a comprehension as well as many of the subtler tricks covered there will not necessarilly be revisited here.

Dictionary comprehensions are so named because they comprehend or encapsulate the entire process of creating and transforming a dictinoary in a single, concise expression. The term "comprehension" conveys the idea that the syntax is self-contained and provides a clear, high-level understanding of what the dictionary will look like after execution. The common element in the comprehension is a for-loop which provides the fundamental mechanism for iterating the construction of the finished dictionary.

Instead of explicitly writing out loops and conditionals in multiple steps (as you would in traditional looping), dictionary comprehensions allow you to describe the process of dictionary creation directly. This declarative style of programming emphasizes what you want (the resulting dictionary) rather than how to get it (the step-by-step looping process).

The same principle applies to other comprehensions (e.g., set comprehensions, list comprehensions, generator comprehensions), which all follow a similar syntax for creating those specific data structures.

In essence, the name reflects the clarity, readability, and compactness of this syntax. It allows you to express the intent behind the construction of a dictionary (or other structures) comprehensively in one place. Here's our comprehensive guide to using and forming them:

1. Basics of Dictionary Comprehensions

List comprehensions are essentially an efficient alternative to constructing loops. The caveat with them is that it is easy to generate list comprehensions that rapidly become unreadable, so it is a balancing act of "coolness" versus practicality. List comprehensions follow this general syntax:

{key_expression : value_expression for key , value in iterable if condition}

key_expression: The key to include in the resulting dictionary item.
value_expression: The value to include in the resulting dictionary item.
key , value: A variable representing each element pair in the iterable. Note that this key, value pair could also be assembled from two chained or nested for loops
iterable: The collection you're iterating over (e.g., a list, range, our DLList or any iterable).
condition (optional): Filters items; only include items that satisfy the condition.

Example:

As with the list comprehension article, we will begin with a simple comparison of a traditional for-loop structure and then map that to a dictionary comprehension. Both solutions build the same dictionary.

This is an example a directory scanner that assembles a list of file names and their sizes in Kb. It scans the directory, filters for files only and collects their sizes rounded to 2 decimal places. It stores the file name and size in a dictionary using the filename as the key. (We then iterate through the dictionary and print it out, one item per line).


# Traditional loop
from pathlib import Path

print("Traditional:")
result = {}
print("Dictionary starts empty:",result, end="\n\n")

directory = Path('.')
for f in directory.iterdir():
    if f.is_file():
       result[f.name] = round(f.stat().st_size / 1024, 2) 
       
for item, val in file_sizes.items():
    print(f"{item} : {val}")
print()

# Dictionary comprehension
print("Comprehension:")
result = {}
print("Dictionary starts empty:",result, end="\n\n")

directory = Path('.')
file_sizes = {f.name: round(f.stat().st_size / 1024, 2) 
                      for f in directory.iterdir() if f.is_file()}
                      
for item, val in file_sizes.items():
    print(f"{item} : {val}")

The example builds a dicionary by iterating over a list of disk objects created by directory.iterdir() and filters those objects for file names (as opposed to directory names) by testing whether the item is a file using f.is_file(). Those items that pass the test are then added to the dictionary by name (as the key)and with their size (as the value).

Let's look at another simple example:


{word: len(word) for word in ['refactor', 'elegant', 'python']}

This is essentially the same as a list comprehension in terms of expected behaviour, except that it builds a dictionary of words and word lengths.

List comprehensions use PyList_New under the hood while dictionary comprehensions use PyDict_New, so there are some differences in semantics and memory behaviour. Noting this let's explore a few real world but useful dictionary comprehension strategies.

2. Simple Use Cases

The simple use cases cover these actions but demonstrate different ways of setting up the dictionary comprehension:

Inverting a Dictionary (with unique values)
Using a Function on the Fly (e.g., Hashing, Rounding, Stripping)
Bucketizing Values (e.g., Grouping by Length, Category, etc.)
Counting with Conditions
Nested Comprehensions (Matrix Flattening or Transformation)
With enumerate, zip, or Complex Iterables
Mapping Column Names to Types Using Functions

Inverting a Dictionary (with unique values):

Useful when keys and values are guaranteed to be one-to-one:


original = {'a': 1, 'b': 2, 'c': 3}
inverted = {v: k for k, v in original.items()}
# Output: {1: 'a', 2: 'b', 3: 'c'}

Using a Function on the Fly (e.g., Hashing, Rounding, Stripping):

Create a quick mapping with transformations:

data = ['  Alpha ', 'Beta', ' Gamma']
clean_map = {item.strip().lower(): len(item.strip()) for item in data}
# Output: {'alpha': 5, 'beta': 4, 'gamma': 5}

Bucketizing Values (e.g., Grouping by Length, Category, etc.):

Group strings by their first letter:


names = ['Alice', 'Arnold', 'Bob', 'Brenda']
grouped = {char: [n for n in names if n.startswith(char)] for char in set(name[0] for name in names)}
# Output: {'A': ['Alice', 'Arnold'], 'B': ['Bob', 'Brenda']}

This tiny block of code manages to demonstrate a set, list and dictionary comprehension at once. The set comprehension is used to ensure the names list has only one instance of each first letter as a basis for iterating over to populate the dictionary keys, while the list comprehension is used to populate the keys values.

Counting with Conditions:

Build a frequency map with filtering baked in:


from collections import Counter

words = ['hello', 'world', 'hello', 'python']
freq = {word: count 
           for word, count in Counter(words).items() if count > 1}
# Output: {'hello': 2}

Nested Comprehensions (Matrix Flattening or Transformation):

Imagine you want a sparse matrix-like representation:


matrix = [[0, 2], [3, 0]]
sparse = {(i, j): val for i, row in enumerate(matrix) for j, val in enumerate(row) if val != 0}
# Output: {(0, 1): 2, (1, 0): 3}

Enumerate() generates a list with indexes, so that the "for i, row in ..." gets an index in i and the matrix data in row. In a sparse matrix we store only the non-zero values, so this comprehension filters for non zero and builds a dictionary that uses the matrix addresses (row, col) as keys for the value held at each location. The must be immutable and hashable so tuples are a good solution for the keys.

With enumerate, zip, or Complex Iterables:

Tracking variable states with indexed references:


values = [10, 20, 30]
varnames = ['x', 'y', 'z']
mapping = {name: val for name, val in zip(varnames, values)}
# Output: {'x': 10, 'y': 20, 'z': 30}

zip() takes two or more lists as inputs and combines them into a list of tuples where each tuple contains the elements from each list at the same index point. So the first tuple created above would be: ('x', 10) and the second: ('y', 20), etc. This list of tuples can then be iterated over extracting each tuple, one at a time into (in this case) name, value.

Mapping Column Names to Types:

If you’re parsing CSV headers and want to guess data types based on sample values:


import ast

def safe_eval(val):
    try:
        return type(ast.literal_eval(val))
    except (ValueError, SyntaxError):
        return str  # fallback for unquoted strings

sample_row = {'age': '34', 'name': 'Alice', 'salary': '70000.50'}
types = {k: type(safe_eval(v)) for k, v in sample_row.items()}
print(types)

# Output: {'age': , 'name': , 'salary': }

safe_eval() is needed because ast.literal_eval() expects strings that resemble valid Python literals, but 'Alice' isn't wrapped in quotes within the string, so it can't be parsed as a Python string literal. Trying to evaluate the bare word Alice, which isn’t defined throws a ValueError or SyntaxError. While you could use "eval()" rather than "ast.literal_eval()" here that is not a good idea for production code as evel will literally eval arbitrary python code in the string, whereas ast.literal_eval() will only evaluate python literals: strings, numbers, tuples,lists,dicts, booleans and None, so it is much safer in this case.

3. Data Manipulation Use Cases

As a dictionary is an efficient indexing system, using hashin to give O(1) index speeds, it is common to use a dictionary in data manipulation activities as a preprocessing or analysis step.

Aggregating Averages by Group:

Suppose you have data grouped by categories and want to compute means:


from statistics import mean

data = {
    'Engineering': [72_000, 85_500, 79_300],
    'Marketing': [60_000, 63_000],
    'Design': [68_000, 70_000, 65_000]
}
avg_salary = {dept: round(mean(salaries), 2) for dept, salaries in data.items()}

print(avg_salary)
# Output: {'Engineering': 78833.33, 'Marketing': 61500.0, 'Design': 67666.67}

This comprehension calculates the mean of the list of values held in the original dictionary. Note we extract both key and value pairs together by iterating over the dictionary.items().

Filtering Outliers or Nulls:

Filter out entries with missing or invalid values:


raw_data = {'a': 10, 'b': None, 'c': 30, 'd': None}
cleaned = {k: v for k, v in raw_data.items() if v is not None}

print(cleaned)
# Output: {'a': 10, 'c': 30}

A reasonably straightforward comprehension that filters for some value, in this case "None", but could also be a < or > test against a value.

Renaming Columns Dynamically

Let’s say we are prepping messy data for Pandas and we want to standardize column names from a CSV:


  cols = [' First Name ', 'AGE', 'emailAddress']
normalized = {col: col.strip().lower().replace(' ', '_') for col in cols}

print(normalized)
# Output: {' First Name ': 'first_name', 'AGE': 'age', 'emailAddress': 'emailaddress'}

Here we are creating a lookup table for column field replacement.

Pivot-like Structures (Nested Dictionaries)

Transform a flat record into nested dictionaries by keys:


records = [
    {'region': 'Asia', 'year': 2020, 'value': 120},
    {'region': 'Asia', 'year': 2021, 'value': 130},
    {'region': 'EU', 'year': 2020, 'value': 100}
]

pivot = {(r['region'], r['year']): r['value'] for r in records}

print(pivot)
# Output: {('Asia', 2020): 120, ('Asia', 2021): 130, ('EU', 2020): 100}

We are starting with a list of dictionaries, and stepping through the list to extract each dictionary, then reorganising the terms by looking up each of the values based on known keys to assemble a new dictionary indexed by region, year tuples.

Dictionary Comprehension in a Generator Pipeline

We can fuse a generator source with a filtering comprehension to build a context-specific dictionary in memory—no intermediate list needed. For example, imagine you're streaming lines from a log file and only want to map error codes to descriptions if they start with "ERR":


def log_stream():
    yield from [
        "INFO User logged in",
        "ERR42 Disk full",
        "ERR17 Timeout occurred",
        "DEBUG Retrying connection"
    ]

error_lookup = {
    line.split()[0]: " ".join(line.split()[1:])
    for line in log_stream() if line.startswith("ERR")
}

print(error_lookup)
# Output: {'ERR42': 'Disk full', 'ERR17': 'Timeout occurred'}

The generator function log_stream() simulates a log file streamed from disk (or other source). The comprehension filters for the ERR string and then assembles a lookup table capturing the error and the explanation after tokenising the line and then reassembling it with the error code token removed. In essence this is the same concept as the scenario where we are using a generator as the iterable source. It doesn't matter whether the generator is a stream of data from a file, list or dynamically propagated number range:


import math

def numbers():
    for i in range(1, 6):
        yield i

factorials = {n: math.factorial(n) for n in numbers()}

print(factorials)
# Output: {1: 1, 2: 2, 3: 6, 4: 24, 5: 120}

Dictionary as a Quick Lookup Table from Complex Data

If we want to construct a lookup table from heterogenous data for use in reverse joins, filters or dictionary merging we might use a dictionary comprehension to paidly extract just the data in which we are interested. For example if we have rows of parsed CSV-like tuples:


rows = [
    ('user1', 'admin', 'AU'),
    ('user2', 'editor', 'US'),
    ('user3', 'viewer', 'NZ'),
]
role_by_user = {user: role for user, role, _ in rows}

print(role_by_user)
# Output: {'user1': 'admin', 'user2': 'editor', 'user3': 'viewer'}

Here we are iterating over a list of tuples, but we are only interested in the "user" and "role" parts of the tuple (the first two elements in each tuple). We can extract the tuple members into variables with "_" as a placeholder for those parts in which we are not interested, and named variables for the others.

Transforming and Nesting (Key-Value Remapping)

We can use dictionary comprehension as data modeling glue—transforming raw inputs into analysis-ready maps. For example if we have received semi-structured JSON objects and want to reshape them for analysis we might create a dictionary of dictionaries indexed by the primary key:


incoming = [
    {'id': 'X1', 'value': 10, 'category': 'alpha'},
    {'id': 'X2', 'value': 20, 'category': 'beta'}
]

reshaped = {
    obj['id']: {'cat': obj['category'], 'val': obj['value']}
    for obj in incoming
}

print(reshaped)
# Output: {'X1': {'cat': 'alpha', 'val': 10}, 'X2': {'cat': 'beta', 'val': 20}}

Programming in Python

References

Programming in Python
Python Method & Function Overloading
Python Type Checking
Python List Comprehension
Python Dictionary Comprehension
Python Iterators
Python Generators
Python Closures
Python Maps and Filters
Python Refactoring With Generators
Python Decorators