Skip to main content
VersionDateNotes
0.1.29February 2026Sampling & throughput control; logits/probabilities capture; GPT attention hooks. See details below.
0.1.0December 2025Initial release: SDK documentation scaffolding (MkDocs Material), request/response instrumentation, Dockerized docs site.

0.1.29 (February 2026)

  • Sampling & throughput control
    • sample_rate (0.0–1.0): fraction of requests to export (e.g. 0.1 = 10%); per-request sampling by request_id.
    • max_requests_per_minute: cap on batch export requests per 60-second sliding window to avoid overloading the platform.
    • Config: top-level or in config; see Configuration.
  • Logits / probabilities capture
    • capture_logits and capture_probabilities (default False): optional capture of model logits and softmax probabilities for Uncertainty, Calibration, and Confidence assessments.
    • logits_sample_size: max number of values in the sample (default 64) to limit payload size.
    • When enabled, the main output field is serialized without the raw logits tensor; logits/probabilities are in dedicated fields. See Configuration.
  • GPT attention hooks
    • Attention instrumentation for GPT-2–style models: hooks on transformer.h[i].attn with the same attention_metrics as BERT/ViT (entropy per head, max/mean attention, head agreement).
    • Requires calling the model with output_attentions=True for attention weights to be captured.
    • Enables Attention Health assessment for GPT; see Integration Guide.