Change Log

Version	Date	Notes
0.1.29	February 2026	Sampling & throughput control; logits/probabilities capture; GPT attention hooks. See details below.
0.1.0	December 2025	Initial release: SDK documentation scaffolding (MkDocs Material), request/response instrumentation, Dockerized docs site.

0.1.29 (February 2026)

Sampling & throughput control
- sample_rate (0.0–1.0): fraction of requests to export (e.g. 0.1 = 10%); per-request sampling by request_id.
- max_requests_per_minute: cap on batch export requests per 60-second sliding window to avoid overloading the platform.
- Config: top-level or in config; see Configuration.
Logits / probabilities capture
- capture_logits and capture_probabilities (default False): optional capture of model logits and softmax probabilities for Uncertainty, Calibration, and Confidence assessments.
- logits_sample_size: max number of values in the sample (default 64) to limit payload size.
- When enabled, the main output field is serialized without the raw logits tensor; logits/probabilities are in dedicated fields. See Configuration.
GPT attention hooks
- Attention instrumentation for GPT-2–style models: hooks on transformer.h[i].attn with the same attention_metrics as BERT/ViT (entropy per head, max/mean attention, head agreement).
- Requires calling the model with output_attentions=True for attention weights to be captured.
- Enables Attention Health assessment for GPT; see Integration Guide.