| Version | Date | Notes |
|---|---|---|
| 0.1.29 | February 2026 | Sampling & throughput control; logits/probabilities capture; GPT attention hooks. See details below. |
| 0.1.0 | December 2025 | Initial release: SDK documentation scaffolding (MkDocs Material), request/response instrumentation, Dockerized docs site. |
0.1.29 (February 2026)
-
Sampling & throughput control
sample_rate(0.0–1.0): fraction of requests to export (e.g.0.1= 10%); per-request sampling byrequest_id.max_requests_per_minute: cap on batch export requests per 60-second sliding window to avoid overloading the platform.- Config: top-level or in
config; see Configuration.
-
Logits / probabilities capture
capture_logitsandcapture_probabilities(defaultFalse): optional capture of model logits and softmax probabilities for Uncertainty, Calibration, and Confidence assessments.logits_sample_size: max number of values in the sample (default 64) to limit payload size.- When enabled, the main
outputfield is serialized without the raw logits tensor; logits/probabilities are in dedicated fields. See Configuration.
-
GPT attention hooks
- Attention instrumentation for GPT-2–style models: hooks on
transformer.h[i].attnwith the sameattention_metricsas BERT/ViT (entropy per head, max/mean attention, head agreement). - Requires calling the model with
output_attentions=Truefor attention weights to be captured. - Enables Attention Health assessment for GPT; see Integration Guide.
- Attention instrumentation for GPT-2–style models: hooks on
