Skip to content

Integration Guide

This section explains how to integrate the SDK into different environments, handle request/response capture, and customize exported data.

Core Concepts

  • OneXMonitor: central object that configures the exporter and framework-specific adapters.
  • Adapters: attach framework-specific hooks (PyTorch, TensorFlow, JAX).
  • Exporter: batches signals and sends them to ingestion endpoints asynchronously.
  • Request context: optional helper to capture raw request payloads and final application responses.

Typical Flow

  1. Instantiate OneXMonitor.
  2. Call monitor.watch(model) for each model you want to instrument.
  3. (Optional) Wrap incoming requests with monitor.request_context(...) to tag raw input + application response.
  4. Call monitor.stop() when your application shuts down.

Basic Example (PyTorch)

from flask import Flask, request, jsonify
import torch
from transformers import AutoModelForSequenceClassification, AutoTokenizer
from onex import OneXMonitor

app = Flask(__name__)

monitor = OneXMonitor(
    api_key="your-api-key",  # Retrieve from https://dashboard.observability.getonex.ai
    endpoint="https://your-ingestion-endpoint",  # Same dashboard provides the ingestion URL
    config={
        "payload_sample_items": 5,
        "payload_tensor_sample": 32,
        "request_metadata": {"app": "bert-api"},
    },
)

MODEL_NAME = "nlptown/bert-base-multilingual-uncased-sentiment"
tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME)
model = AutoModelForSequenceClassification.from_pretrained(
    MODEL_NAME,
    output_hidden_states=True,
    output_attentions=True,
)
model.eval()
model = monitor.watch(model)

@app.route("/predict", methods=["POST"])
def predict():
    payload = request.json or {}
    text = payload.get("text", "")

    with monitor.request_context({"text": text}, metadata={"route": "/predict"}) as ctx:
        inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=512)
        with torch.no_grad():
            outputs = model(**inputs)

        probs = torch.softmax(outputs.logits, dim=-1)
        rating = int(torch.argmax(probs).item() + 1)
        confidence = float(torch.max(probs).item())

        api_response = {"rating": rating, "confidence": confidence, "text": text}
        ctx.record_response(api_response)

    return jsonify(api_response)

if __name__ == "__main__":
    app.run()

Why the request context?

  • Adds a raw block to the request payload event (your original JSON, not just tensors)
  • Emits an application-response event alongside the automatic model-output event
  • Reuses the same request_id for all neural signals, request payload, and response records

Manual Instrumentation

If you can’t call monitor.request_context, you can drive the adapter manually:

adapter = monitor.adapter
request_id = adapter.start_request_context(payload={"text": text})
try:
    outputs = model(**inputs)
    adapter.export_manual_response(
        request_id,
        response_payload={"rating": rating, "confidence": confidence},
        success=True,
    )
finally:
    adapter.end_request_context()

Graceful Shutdown

import signal

def shutdown(*args):
    monitor.stop()
    raise SystemExit(0)

signal.signal(signal.SIGTERM, shutdown)
signal.signal(signal.SIGINT, shutdown)