simet.metrics.precision_recall¶
simet.metrics.precision_recall ¶
PrecisionRecall ¶
PrecisionRecall(
*,
metric="l2",
index_type="flat",
nlist=1024,
use_gpu=True,
num_gpus=None,
batch_size=None,
use_fp16=False,
random_state=1234,
)
Bases: Metric[tuple[float, float]]
Precision/Recall between real and synthetic feature sets via FAISS k-NN.
Computes
- Precision: fraction of synthetic samples whose 1-NN in the real set lies within the synthetic sample’s real-set k-th neighbor radius.
- Recall: fraction of real samples whose 1-NN in the synthetic set lies within the real sample’s synthetic-set k-th neighbor radius.
Distance & index backends:
- metric="l2": uses squared L2 distances.
- metric="cosine": uses inner-product search on L2-normalized vectors
(we L2-normalize both sets in-place) and converts sims to cosine distances
as 1 - sim.
- index_type="flat": exact search (IndexFlatL2 / IndexFlatIP).
- index_type="ivf": coarse-quantized IVF (IndexIVFFlat), requires training.
GPU
- If
use_gpu=Trueand GPUs are available, the FAISS index is cloned to GPU. - With
num_gpus > 1, uses a sharded multi-GPU index. use_fp16=Truestores/searches in fp16 on GPU (memory/speed trade-offs).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
metric
|
Literal['l2', 'cosine']
|
Distance/similarity type. Defaults to "l2". |
'l2'
|
index_type
|
Literal['flat', 'ivf']
|
FAISS index type. Defaults to "flat". |
'flat'
|
nlist
|
int
|
Number of IVF lists if |
1024
|
use_gpu
|
bool
|
Enable GPU indices when available. Defaults to True. |
True
|
num_gpus
|
int | None
|
Number of GPUs to use ( |
None
|
batch_size
|
int | None
|
Query batch size for FAISS search ( |
None
|
use_fp16
|
bool
|
Use fp16 storage/search on GPU. Defaults to False. |
False
|
random_state
|
int | None
|
Seed for FAISS training (IVF). Defaults to 1234. |
1234
|
Notes
- If either feature set is empty, returns
(0.0, 0.0)with a warning. - If
k≥ min(|real|, |synth|),kis clamped down to avoid degeneracy. - For
metric="cosine", features are L2-normalized in place.
Source code in simet/metrics/precision_recall.py
55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 | |
compute ¶
compute(loader, k=5)
Compute (precision, recall) between real/synth feature sets.
Steps
1) Validate shapes: both 2D and same D.
2) (Cosine only) L2-normalize real and synth in place.
3) Build k-th neighbor radii per-domain (real→real, synth→synth).
4) Precision: 1-NN of synth in real vs real radii.
5) Recall: 1-NN of real in synth vs synth radii.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
loader
|
DatasetLoader
|
Provides |
required |
k
|
int
|
Neighborhood size for radii (k-th neighbor). Defaults to 5. |
5
|
Returns:
| Type | Description |
|---|---|
Tuple[float, float]
|
Tuple[float, float]: |
Raises:
| Type | Description |
|---|---|
ValueError
|
On invalid shapes, NaN/Inf, or invalid |
Notes
- If
k >= min(len(real), len(synth)),kwill be clamped tomin(n)-1. - For IVF indices, training is performed on the same domain used to build radii/DB.
Source code in simet/metrics/precision_recall.py
84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 | |