A pattern for strategy backtracking using Python generators - HedgeDoc
90 views
<center> # A pattern for strategy backtracking using Python generators <big> **How fixing a bug led us to a cool, clean, simple pattern for strategy backtracking using Python generators and generator expressions.** </big> *Written by [Juan Diego Caballero](https://github.com/jdcaballerov). Originally published 2022-10-25 on the [Monadical blog](https://monadical.com/blog.html).* </center> A while back, we were tasked with fixing a bug that was affecting an [ETL pipeline](https://en.wikipedia.org/wiki/Extract,_transform,_load). The pipeline ingested events and generated a name string for an asset during one of the steps. The name string would be generated using one of several strategies and in order of preference. We noticed that the catch-all final strategy was being selected more often than was expected, and several asset names were empty. Doing the bug fix, we found a very cool, clean, and simple pattern for strategy backtracking using Python Generators and generator expressions. Here’s how we did it! ## The original implementation Here’s our code as it appeared in a simplified version of the original implementation:: A function returned a candidate asset name string using different strategies as follows: python def get_candidate_asset_name(*args, **kwargs) -> str: if conditions_A(*args, **kwargs): return name_with_strategy_A(*args, **kwargs) if conditions_B(*args, **kwargs): return name_with_strategy_B(*args, **kwargs) if conditions_C(*args, **kwargs): return name_with_strategy_C(*args, **kwargs) return name_with_catch_all_strategy(*args, **kwargs)  And then, was used as follows: python asset_name = get_candidate_asset_name(*args, **kwargs ) …. …. asset_name = sanitize_asset_name(asset_name) If not asset_name: asset_name = sanitize_asset_name(name_with_catch_all_strategy(*args, **kwargs))  ## Problems with our first approach Readers with sharp eyes may have noticed several problems with the former implementation – it doesn’t look good. Some of the problems include: - The calling function is not clean and is not following the [Single Responsibility Principle](https://www.digitalocean.com/community/conceptual_articles/s-o-l-i-d-the-first-five-principles-of-object-oriented-design). It gets the name but also implements the policy to default to the catch-all. - The implementation is not trying the different strategies in order of preference as required. It selects a strategy and, if sanitization rejects it by returning empty, then it defaults awkwardly to the sanitized name from.name_with_catch_all_strategy - The name generated by the name_with_catch_all_strategy could also return empty after sanitization and continue unnoticed. ## How we fixed them We considered some implementations using the while loop to test different strategies. For example, we considered a simple DAG and a mini state machine; however, none of them looked cleaner or more decoupled, and none produced a simpler diff change than our final solution using Generators and generator expressions. To get the final version of our code, we modified the name candidates function to yield instead of return. Here’s what that looks like: python def get_candidate_asset_name(*args, **kwargs) -> Generator[str, None, str]: if conditions_A(*args, **kwargs): yield name_with_strategy_A(*args, **kwargs) if conditions_B(*args, **kwargs): yield name_with_strategy_B(*args, **kwargs) if conditions_C(*args, **kwargs): yield name_with_strategy_C(*args, **kwargs) yield name_with_catch_all_strategy(*args, **kwargs)  And we used it as follows: python asset_name = next( ( sanitize_asset_name(name) for name in get_candidate_asset_name(*args, **kwargs) if sanitize_asset_name(name) ), "NAME NOT FOUND", )  So, what is the code actually doing? First, a generator expression is created from the non-empty sanitized name candidates and advanced to the first yield using next. Then, it gets the first non-empty sanitized name. In case none of the strategies, which are tried in order of preference sequentially, produce a non-empty sanitized name the default value “NAME NOT FOUND” is returned by the next function when StopIteration is reached. Cool, huh? Let us know if you can find other clever uses for this pattern!

Recent posts: