rule_matcher
Rule matching utilities for fuzzy matching and code hash lookup.
Classes¶
RuleMatchingEngine ¶
Matches rules by name similarity and code hash.
Used to find timing data for renamed rules or similar rules when exact matches aren't available.
Source code in snakesee/estimation/rule_matcher.py
42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 | |
Functions¶
__init__ ¶
Initialize the matcher.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
max_distance
|
int
|
Maximum Levenshtein distance for fuzzy matches. |
3
|
find_best_match ¶
find_best_match(rule: str, code_hash_to_rules: dict[str, set[str]], rule_stats: dict[str, RuleTimingStats], max_distance: int | None = None) -> tuple[str, RuleTimingStats] | None
Find the best matching rule using code hash then fuzzy matching.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
rule
|
str
|
Rule name to match. |
required |
code_hash_to_rules
|
dict[str, set[str]]
|
Mapping of code hashes to rule sets. |
required |
rule_stats
|
dict[str, RuleTimingStats]
|
Available rule statistics. |
required |
max_distance
|
int | None
|
Maximum Levenshtein distance. |
None
|
Returns:
| Type | Description |
|---|---|
tuple[str, RuleTimingStats] | None
|
Tuple of (matched_rule, stats), or None if no match. |
Source code in snakesee/estimation/rule_matcher.py
find_by_code_hash ¶
find_by_code_hash(rule: str, code_hash_to_rules: dict[str, set[str]], known_rules: set[str]) -> str | None
Find a rule with matching code hash.
When multiple rules share the same code hash and are in known_rules, returns the lexicographically smallest rule name for deterministic behavior.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
rule
|
str
|
Rule name to look up. |
required |
code_hash_to_rules
|
dict[str, set[str]]
|
Mapping of code hashes to rule sets. |
required |
known_rules
|
set[str]
|
Set of rules with available stats. |
required |
Returns:
| Type | Description |
|---|---|
str | None
|
Name of matching rule (lexicographically smallest if multiple), |
str | None
|
or None if not found. |
Source code in snakesee/estimation/rule_matcher.py
find_similar ¶
find_similar(rule: str, known_rules: set[str], max_distance: int | None = None) -> tuple[str, int] | None
Find similar rule by Levenshtein distance.
When multiple rules have the same distance, returns the lexicographically smallest one for deterministic behavior.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
rule
|
str
|
Rule name to match. |
required |
known_rules
|
set[str]
|
Set of rules to search. |
required |
max_distance
|
int | None
|
Maximum distance (uses instance default if None). |
None
|
Returns:
| Type | Description |
|---|---|
tuple[str, int] | None
|
Tuple of (matched_rule, distance), or None if no match. |
Source code in snakesee/estimation/rule_matcher.py
Functions¶
levenshtein_distance
cached
¶
Calculate the Levenshtein distance between two strings.
Results are cached for efficiency when comparing the same rule names multiple times during estimation.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
s1
|
str
|
First string. |
required |
s2
|
str
|
Second string. |
required |
Returns:
| Type | Description |
|---|---|
int
|
The minimum number of edits (insertions, deletions, substitutions) |
int
|
needed to transform s1 into s2. |