metadata
Metadata file parsing for Snakemake workflows.
This module handles parsing of .snakemake/metadata/ files which contain information about completed jobs, including timing, wildcards, and code.
Note: Currently (Snakemake <= 8.x), metadata files do NOT store wildcards. Wildcards are only available from live log events during the current session. This means combination-based estimates (wildcard+threads) only work for jobs that ran in the current session.
TODO: Once https://github.com/snakemake/snakemake/pull/3888 is merged and released, metadata files will include wildcards, enabling historical combination-based estimates across sessions.
Classes¶
MetadataRecord
dataclass
¶
Single metadata file parsed data for efficient single-pass collection.
Contains all fields needed by various collection functions so we only read each metadata file once.
Source code in snakesee/parser/metadata.py
Functions¶
calculate_metadata_input_size ¶
Calculate total input size from file list.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
input_files
|
list[str] | None
|
List of input file paths from metadata. |
required |
Returns:
| Type | Description |
|---|---|
int | None
|
Total size in bytes, or None if not a valid list or any file is missing. |
Source code in snakesee/parser/metadata.py
collect_rule_code_hashes ¶
collect_rule_code_hashes(metadata_dir: Path, progress_callback: ProgressCallback | None = None) -> dict[str, set[str]]
Collect code hashes for each rule from metadata files.
This enables detection of renamed rules by matching their shell code. If two rules have the same code hash, they are likely the same rule that was renamed.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
metadata_dir
|
Path
|
Path to .snakemake/metadata/ directory. |
required |
progress_callback
|
ProgressCallback | None
|
Optional callback(current, total) for progress reporting. |
None
|
Returns:
| Type | Description |
|---|---|
dict[str, set[str]]
|
Dictionary mapping code_hash -> set of rule names that use that code. |
Source code in snakesee/parser/metadata.py
parse_metadata_files ¶
parse_metadata_files(metadata_dir: Path, progress_callback: ProgressCallback | None = None) -> Iterator[JobInfo]
Parse completed job information from Snakemake metadata files.
Reads JSON metadata files from .snakemake/metadata/ to extract timing information for completed jobs, including input file sizes.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
metadata_dir
|
Path
|
Path to .snakemake/metadata/ directory. |
required |
progress_callback
|
ProgressCallback | None
|
Optional callback(current, total) for progress reporting. |
None
|
Yields:
| Type | Description |
|---|---|
JobInfo
|
JobInfo instances for each completed job found. |
Source code in snakesee/parser/metadata.py
parse_metadata_files_full ¶
parse_metadata_files_full(metadata_dir: Path, progress_callback: ProgressCallback | None = None) -> Iterator[MetadataRecord]
Parse all metadata from Snakemake metadata files in a single pass.
This is more efficient than calling parse_metadata_files and collect_rule_code_hashes separately, as it reads each file only once.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
metadata_dir
|
Path
|
Path to .snakemake/metadata/ directory. |
required |
progress_callback
|
ProgressCallback | None
|
Optional callback(current, total) for progress reporting. |
None
|
Yields:
| Type | Description |
|---|---|
MetadataRecord
|
MetadataRecord instances containing timing and code hash data. |