simet.services.roc_auc¶

simet.services.roc_auc ¶

RocAucService ¶

Utilities for feature standardization used in ROC-AUC workflows.

standardize_train `staticmethod` ¶

standardize_train(X)

Fit standardization parameters on X and return the standardized data.

Computes per-feature mean and standard deviation over the batch and returns the standardized tensor along with the fitted parameters.

Parameters:

Name	Type	Description	Default
`X`	`Tensor`	Input features of shape `(n_samples, n_features)`. Can be any floating dtype; output matches `X.dtype`.	required

Returns:

Type	Description
`tuple[Tensor, Tensor, Tensor]`	tuple[torch.Tensor, torch.Tensor, torch.Tensor]: - `X_std`: Standardized features, shape `(n_samples, n_features)`. - `mu`: Per-feature mean, shape `(1, n_features)`. - `sigma`: Per-feature std (clipped to >= 1e-6), shape `(1, n_features)`.

Notes

Uses sigma = std.clamp_min(1e-6) to avoid division by zero.
Statistics are computed along dim=0 with keepdim=True so they broadcast correctly when standardizing.
For reproducible pipelines, persist mu and sigma for use on validation/test sets.

Source code in simet/services/roc_auc.py

@staticmethod
def standardize_train(
    X: torch.Tensor,
) -> tuple[torch.Tensor, torch.Tensor, torch.Tensor]:
    """Fit standardization parameters on `X` and return the standardized data.

    Computes per-feature mean and standard deviation over the **batch** and
    returns the standardized tensor along with the fitted parameters.

    Args:
        X (torch.Tensor): Input features of shape ``(n_samples, n_features)``.
            Can be any floating dtype; output matches `X.dtype`.

    Returns:
        tuple[torch.Tensor, torch.Tensor, torch.Tensor]:
            - ``X_std``: Standardized features, shape ``(n_samples, n_features)``.
            - ``mu``: Per-feature mean, shape ``(1, n_features)``.
            - ``sigma``: Per-feature std (clipped to >= 1e-6), shape ``(1, n_features)``.

    Notes:
        - Uses ``sigma = std.clamp_min(1e-6)`` to avoid division by zero.
        - Statistics are computed along ``dim=0`` with ``keepdim=True`` so they
          broadcast correctly when standardizing.
        - For reproducible pipelines, persist ``mu`` and ``sigma`` for use on
          validation/test sets.
    """
    mu = X.mean(dim=0, keepdim=True)
    sigma = X.std(dim=0, keepdim=True).clamp_min(1e-6)
    return (X - mu) / sigma, mu, sigma

standardize_with `staticmethod` ¶

standardize_with(X, mu, sigma)

Standardize X using provided per-feature mean and std.