Skip to main content
This section explains how to integrate the SDK into different environments, handle request/response capture, and customize exported data.

Core Concepts

  • OneXMonitor: central object that configures the exporter and framework-specific adapters.
  • Adapters: attach framework-specific hooks (PyTorch, TensorFlow, JAX).
  • Exporter: batches signals and sends them to ingestion endpoints asynchronously.
  • Request context: optional helper to capture raw request payloads and final application responses.

Typical Flow

  1. Instantiate OneXMonitor.
  2. Call monitor.watch(model) for each model you want to instrument.
  3. (Optional) Wrap incoming requests with monitor.request_context(...) to tag raw input + application response.
  4. Call monitor.stop() when your application shuts down.

Basic Example (PyTorch)

from flask import Flask, request, jsonify
import torch
from transformers import AutoModelForSequenceClassification, AutoTokenizer
from onex import OneXMonitor

app = Flask(__name__)

monitor = OneXMonitor(
    api_key="your-api-key",  # Retrieve from https://dashboard.observability.getonex.ai
    endpoint="onex-ingestion-endpoint",  # Same dashboard provides the ingestion URL
    config={
        "payload_sample_items": 5,
        "payload_tensor_sample": 32,
        "request_metadata": {"app": "bert-api"},
    },
)

MODEL_NAME = "nlptown/bert-base-multilingual-uncased-sentiment"
tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME)
model = AutoModelForSequenceClassification.from_pretrained(
    MODEL_NAME,
    output_hidden_states=True,
    output_attentions=True,
)
model.eval()
model = monitor.watch(model)

@app.route("/predict", methods=["POST"])
def predict():
    payload = request.json or {}
    text = payload.get("text", "")

    with monitor.request_context({"text": text}, metadata={"route": "/predict"}) as ctx:
        inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=512)
        with torch.no_grad():
            outputs = model(**inputs)

        probs = torch.softmax(outputs.logits, dim=-1)
        rating = int(torch.argmax(probs).item() + 1)
        confidence = float(torch.max(probs).item())

        api_response = {"rating": rating, "confidence": confidence, "text": text}
        ctx.record_response(api_response)

    return jsonify(api_response)

if __name__ == "__main__":
    app.run()
For all config options and OneXMonitor parameters, see the Configuration Reference.

Why the request context?

  • Adds a raw block to the request payload event (your original JSON, not just tensors)
  • Emits an application-response event alongside the automatic model-output event
  • Reuses the same request_id for all neural signals, request payload, and response records

Attention metrics (BERT, GPT, ViT)

To capture attention signals (entropy per head, max/mean attention, head agreement) for Attention Health assessment, call the model with output_attentions=True:
# BERT / ViT
model = AutoModel.from_pretrained(..., output_attentions=True)
outputs = model(**inputs)  # attention weights are passed to SDK hooks

# GPT-2 style
outputs = model(**inputs, output_attentions=True)
GPT models use hooks on transformer.h[i].attn; attention weights are only returned when the model forward receives output_attentions=True.

Enabling or disabling request/response capture

By default, the SDK sends both request payloads (model inputs) and response payloads (model outputs, application responses) to the platform, in addition to neural signals. You can disable either:
monitor = OneXMonitor(
    api_key="your-api-key",
    endpoint="onex-ingestion-endpoint",
    config={
        "capture_request_payload": False,   # Do not send request payloads
        "capture_response_payload": False,  # Do not send response payloads
    },
)
Neural signals are always sent to /api/signals/batch regardless of these settings. Use this when you want to reduce data volume or avoid sending sensitive input/output to the platform.

Manual Instrumentation

If you can’t call monitor.request_context, you can drive the adapter manually:
adapter = monitor.adapter
request_id = adapter.start_request_context(payload={"text": text})
try:
    outputs = model(**inputs)
    adapter.export_manual_response(
        request_id,
        response_payload={"rating": rating, "confidence": confidence},
        success=True,
    )
finally:
    adapter.end_request_context()

Graceful Shutdown

import signal

def shutdown(*args):
    monitor.stop()
    raise SystemExit(0)

signal.signal(signal.SIGTERM, shutdown)
signal.signal(signal.SIGINT, shutdown)