Back to Platform
Scalable Runtime

Deployment & Inference

Deploy AI models to edge, on-prem, or cloud — with ONNX/TensorRT optimization, multi-stream inference, rule engine, and event output pipelines.

Deployment Targets

  • Edge devices (NVIDIA Jetson, Intel NCS, Coral)
  • On-premises GPU servers (multi-GPU clusters)
  • Cloud deployment (AWS, GCP, Azure, private cloud)
  • Hybrid edge-cloud topologies
  • Offline mode with local inference capability

Inference Optimization

  • ONNX Runtime optimization for cross-platform
  • TensorRT acceleration for NVIDIA GPUs
  • INT8/FP16 quantization for edge performance
  • Multi-stream parallel inference pipelines
  • Dynamic batching for throughput optimization

Scene-Aware Configuration

  • Per-camera detection zone & ROI configuration
  • Confidence threshold tuning per class & scene
  • Schedule-based model switching (day/night)
  • Cascading model pipelines (detect → classify → act)
  • Scene-specific post-processing rules

Event Output Pipeline

  • REST API event dispatch with retry logic
  • MQTT publishing for IoT integration
  • Webhook triggers for third-party systems
  • gRPC streaming for real-time consumers
  • Event buffering & deduplication controls