utils
Shared utility functions for snakesee.
This module consolidates common utilities used across multiple modules to avoid duplication and ensure consistent behavior.
Classes¶
MetadataCache ¶
Thread-safe cache for parsed metadata files.
Tracks file mtimes to skip re-reading unchanged files.
Source code in snakesee/utils.py
Functions¶
__init__ ¶
__len__ ¶
clear ¶
get ¶
Get cached data if file hasn't changed.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
path
|
Path
|
Path to the metadata file. |
required |
mtime
|
float
|
Current file modification time. |
required |
inode
|
int
|
Current file inode. |
required |
Returns:
| Type | Description |
|---|---|
dict[str, Any] | None
|
Cached data if valid, None if cache miss or stale. |
Source code in snakesee/utils.py
put ¶
Store parsed data in cache.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
path
|
Path
|
Path to the metadata file. |
required |
mtime
|
float
|
File modification time. |
required |
inode
|
int
|
File inode. |
required |
data
|
dict[str, Any]
|
Parsed JSON data. |
required |
Source code in snakesee/utils.py
Functions¶
get_metadata_cache ¶
get_metadata_cache() -> MetadataCache
get_scan_cache ¶
iterate_metadata_files ¶
iterate_metadata_files(metadata_dir: Path, progress_callback: ProgressCallback | None = None, *, sort_by_mtime: bool = True, newest_first: bool = True, use_cache: bool = True, use_parallel: bool = True, max_workers: int = DEFAULT_METADATA_WORKERS) -> Iterator[tuple[Path, dict[str, Any]]]
Iterate metadata files with optional progress reporting.
Iterates over all files in the metadata directory, parsing each as JSON. Invalid files (non-JSON or unreadable) are silently skipped with debug logging.
Performance optimizations: - Uses os.scandir instead of rglob (6-7x faster directory iteration) - Sorts by mtime to process newest files first (better for recent data) - Caches parsed files to skip re-reading unchanged files - Uses parallel I/O for very large directories (>=1000 files)
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
metadata_dir
|
Path
|
Path to .snakemake/metadata/ directory. |
required |
progress_callback
|
ProgressCallback | None
|
Optional callback(current, total) for progress reporting. |
None
|
sort_by_mtime
|
bool
|
Sort files by modification time. |
True
|
newest_first
|
bool
|
If sorting, put newest files first. |
True
|
use_cache
|
bool
|
Use global cache to skip unchanged files. |
True
|
use_parallel
|
bool
|
Use parallel I/O for large directories. |
True
|
max_workers
|
int
|
Maximum number of parallel workers. |
DEFAULT_METADATA_WORKERS
|
Yields:
| Type | Description |
|---|---|
tuple[Path, dict[str, Any]]
|
Tuples of (file_path, parsed_json_data) for each valid metadata file. |
Source code in snakesee/utils.py
json_loads ¶
Parse JSON using orjson for better performance.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data
|
str | bytes
|
JSON string or bytes to parse. |
required |
Returns:
| Type | Description |
|---|---|
Any
|
Parsed JSON data. |
Raises:
| Type | Description |
|---|---|
JSONDecodeError
|
If the data is not valid JSON. |
Source code in snakesee/utils.py
safe_file_size ¶
Safely get file size in bytes, returning 0 on error.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
path
|
Path
|
Path to the file. |
required |
Returns:
| Type | Description |
|---|---|
int
|
File size in bytes, or 0 if file doesn't exist or can't be accessed. |
Source code in snakesee/utils.py
safe_mtime ¶
Get file modification time, returning 0.0 if file doesn't exist.
This handles the common race condition where a file may be deleted between checking for existence and reading its mtime.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
path
|
Path
|
Path to the file. |
required |
Returns:
| Type | Description |
|---|---|
float
|
The file's modification time as a Unix timestamp, or 0.0 if the |
float
|
file doesn't exist. |
Source code in snakesee/utils.py
safe_read_json ¶
Safely read and parse JSON from a file.
Handles file access errors and JSON parse errors gracefully.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
path
|
Path
|
Path to the JSON file. |
required |
default
|
dict[str, Any] | None
|
Value to return if file cannot be read or parsed. |
None
|
Returns:
| Type | Description |
|---|---|
dict[str, Any] | None
|
Parsed JSON as dict, or default if reading/parsing fails. |
Source code in snakesee/utils.py
safe_read_text ¶
Safely read text from a file, returning default on error.
Handles common race conditions and encoding issues gracefully.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
path
|
Path
|
Path to the file. |
required |
default
|
str
|
Value to return if file cannot be read. |
''
|
errors
|
str
|
How to handle encoding errors (passed to read_text). |
'ignore'
|
Returns:
| Type | Description |
|---|---|
str
|
File contents as string, or default if reading fails. |