simet.services.feature_cache¶
simet.services.feature_cache ¶
FeatureCacheService ¶
FeatureCacheService(cache_dir=Path('cache/features'))
Disk cache for feature matrices computed from DataLoaders.
Stores/loads precomputed feature arrays (e.g., (N, D) floats) keyed by a
deterministic hash derived from:
- dataset identity (root path or dataset type/length),
- subset membership (sorted indices for torch.utils.data.Subset),
- feature-extractor suffix (e.g., "inception_v3"),
- loader parameters (e.g., batch_size).
Features are serialized via pickle under cache_dir/<md5>.pkl.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
cache_dir
|
Path | str
|
Directory where cached feature files are written/read.
Created if it does not exist. Defaults to |
Path('cache/features')
|
Example
svc = FeatureCacheService("cache/features") feats = svc.get_or_compute( ... loader=my_loader, ... compute_fn=my_extractor_fn, # def f(loader) -> np.ndarray ... cache_key_suffix="inception_v3", ... ) feats.shape # doctest: +SKIP (N, D)
Notes
- Security:
pickleis not safe for untrusted inputs. Only load files created by this application. - Subsets: For
Subsetdatasets, the cache key includes a hash of the sorted indices, making different subsets cache to different files. - Invalidation: Changing any key component (dataset path/size, suffix, batch size) produces a different cache key and thus a cache miss.
Initialize the cache directory and logger.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
cache_dir
|
Path
|
Directory to store cache files ( |
Path('cache/features')
|
Source code in simet/services/feature_cache.py
50 51 52 53 54 55 56 57 58 | |
get_or_compute ¶
get_or_compute(loader, compute_fn, cache_key_suffix='', force_recompute=False)
Return cached features for loader or compute and cache them.
Builds a stable cache key (see _generate_cache_key) and attempts to
load the feature array. If missing or force_recompute=True, runs
compute_fn(loader), saves the result, and returns it.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
loader
|
DataLoader[VisionDataset]
|
DataLoader providing samples for feature extraction. |
required |
compute_fn
|
Callable[[DataLoader[VisionDataset]], ndarray]
|
Callable that computes the feature matrix from |
required |
cache_key_suffix
|
str
|
Disambiguator for different extractors/configs
(e.g., |
''
|
force_recompute
|
bool
|
If True, bypass cache and recompute, then overwrite the cache entry. |
False
|
Returns:
| Type | Description |
|---|---|
ndarray
|
np.ndarray: The feature matrix for all samples in |
Logging
Emits INFO on cache hits/misses and DEBUG with the resolved cache path.
Source code in simet/services/feature_cache.py
60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 | |