Ojasa Mirai

Python

Learning Level

🎯 Why Data Structures Matter 📋 Lists: Ordered Collections 🔑 Dictionaries: Key-Value Pairs 🎪 Sets: Unique Values & Speed 📦 Tuples: Immutable Sequences ⚖️ Comparing All Data Structures 🔧 Mastering List Methods ⚡ Advanced Dictionary Patterns 🔄 Power of Set Operations 🏗️ Building Nested Structures ⚙️ Performance Tuning & Optimization

Python/Data Structures/Data Structure Performance

⚡ Advanced Performance Analysis — Profiling and Optimization

Master techniques to measure, analyze, and optimize data structure performance.

🎯 Profiling with timeit

The `timeit` module measures execution time reliably by running code multiple times.

import timeit

# Setup code
setup = """
lst = list(range(1000))
d = {i: i for i in range(1000)}
s = set(range(1000))
"""

# List search - O(n)
list_time = timeit.timeit('500 in lst', setup=setup, number=100000)
print(f"List search: {list_time:.4f}s")  # ~0.15s

# Dict search - O(1)
dict_time = timeit.timeit('500 in d', setup=setup, number=100000)
print(f"Dict search: {dict_time:.4f}s")  # ~0.005s

# Set search - O(1)
set_time = timeit.timeit('500 in s', setup=setup, number=100000)
print(f"Set search: {set_time:.4f}s")  # ~0.005s

print(f"Dict/Set are {list_time/dict_time:.0f}x faster than list!")

💡 Profiling with cProfile

`cProfile` shows which functions consume the most time.

import cProfile
import pstats
from io import StringIO

def process_data():
    """Simulate data processing"""
    # Find duplicates - inefficient
    data = list(range(1000))
    duplicates = []
    for x in data:  # O(n)
        for y in data:  # O(n) nested
            if x == y and x != y:  # Always false, but benchmark anyway
                duplicates.append(x)
    return duplicates

# Profile the function
pr = cProfile.Profile()
pr.enable()
process_data()
pr.disable()

# Print results
s = StringIO()
ps = pstats.Stats(pr, stream=s).sort_stats('cumulative')
ps.print_stats(10)
print(s.getvalue())

# Output shows:
# ncalls  tottime  percall  cumtime  percall filename:lineno(function)
#     1    0.150    0.150    0.150    0.150 test.py:2(process_data)

🎨 Real-World Optimization Case Study

import timeit
from typing import List, Set, Dict

# Problem: Find which users appear in two lists
users_a = list(range(10000))
users_b = list(range(5000, 15000))

# Approach 1: Nested loops O(n²) - SLOW
def find_common_slow(a, b):
    result = []
    for user in a:
        if user in b:  # List membership O(n)
            result.append(user)
    return result

# Approach 2: Convert to set O(n) - FAST
def find_common_fast(a, b):
    return list(set(a) & set(b))

# Approach 3: Set operation directly O(n)
def find_common_best(a, b):
    return set(a) & set(b)

# Benchmark
slow_time = timeit.timeit(
    'find_common_slow(users_a, users_b)',
    globals={'find_common_slow': find_common_slow, 'users_a': users_a, 'users_b': users_b},
    number=100
)

fast_time = timeit.timeit(
    'find_common_fast(users_a, users_b)',
    globals={'find_common_fast': find_common_fast, 'users_a': users_a, 'users_b': users_b},
    number=100
)

print(f"Slow (nested): {slow_time:.3f}s")  # ~5.2s
print(f"Fast (set): {fast_time:.3f}s")     # ~0.08s
print(f"Speedup: {slow_time/fast_time:.0f}x")  # ~65x faster!

📊 Decision Matrix for Data Structure Selection

Task	Structure	Complexity	Reasoning
Store ordered items	List	O(1) append, O(n) search	Sequential access
Lookup by key	Dict	O(1) average	Fast key access
Membership check (large)	Set	O(1) average	Hash-based lookup
Fixed immutable data	Tuple	O(1) access, hashable	Memory efficient
Multiple operations	Custom class	Varies	Domain-specific

🎯 Micro-optimization Techniques

# 1. List comprehension vs loop (faster creation)
# Slow
result = []
for i in range(10000):
    result.append(i * 2)

# Fast
result = [i * 2 for i in range(10000)]

# 2. Generator expression vs list comprehension (memory)
# Takes memory
squares = [x**2 for x in range(1000000)]

# Lazy evaluation - memory efficient
squares = (x**2 for x in range(1000000))

# 3. Local variable access (faster than global/attribute)
# Slow
total = 0
for i in range(1000000):
    total += some_module.CONSTANT

# Fast - cache in local variable
constant = some_module.CONSTANT
total = 0
for i in range(1000000):
    total += constant

# 4. Built-in functions (C implementation)
# Slow
total = 0
for x in data:
    total += x

# Fast
total = sum(data)

🔑 Key Takeaways

✅ Use `timeit` for reliable performance measurements

✅ Profile with `cProfile` to find bottlenecks

✅ Set operations are 50-100x faster than list for membership

✅ Big-O complexity matters more than micro-optimizations

✅ Cache variables, use comprehensions, prefer built-ins

Ready to practice? Challenges | Quiz

Resources

Python Docs

Ojasa Mirai

Master AI-powered development skills through structured learning, real projects, and verified credentials. Whether you're upskilling your team or launching your career, we deliver the skills companies actually need.

Learn Deep • Build Real • Verify Skills • Launch Forward

Courses

Python Fastapi ReactJS Cloud

Resources

Blog & Articles GitHub Projects Video Tutorials

Ecosystem

Ojasa Mirai Site My Growth Learning Portal Community Discord

Twitter GitHub LinkedIn