Back to blog
Tony LoFebruary 17, 202614 min read

Building a Production-Grade AI Inspection Pipeline: From PatchCore to Factory Floor

Edge AIComputer VisionManufacturingAnomaly DetectionNVIDIA JetsonPatchCore

Most vision AI companies talk about "detecting defects." Very few explain how to build a system that actually runs in a factory—on real hardware, at real speed, with real operator workflows.

This article explains how IntelFactor engineers a production-grade AI inspection platform: from patch-based anomaly detection to deterministic edge control, continuous learning, and enterprise governance. We compare product-level inspection with process monitoring, detail the technical architecture, and provide a phased buildout plan with deployment guidance on NVIDIA Jetson Orin Nano hardware.

Product Inspection vs Process Monitoring

Not all factory vision is the same. There are two fundamentally different approaches:

  • Process-Level Monitoring watches whether the production process (conveyor cycles, machinery rhythm) is running normally using overhead cameras. It alerts on jams, spills, missing items, or line-level anomalies. Strength: broad situational awareness. Limitation: coarse insight, not tuned for individual part defects.
  • Product-Level Inspection examines each unit at the region-of-interest level (surface, dimensions, assembly) using controlled industrial cameras. It learns "good" product appearance and flags deviations. Strength: precise defect detection (scratches, dents, misalignments). Limitation: narrower scope (specific station), but that is intentional.

In practice, a factory might use both: process monitoring ensures the line runs, product inspection ensures what comes off the line meets quality. IntelFactor's value lies in fine-grained quality control and root-cause insights.

Technical Architecture

IntelFactor's pipeline runs entirely on edge hardware. Here's how the components connect:

Camera → Edge Station (Jetson Orin Nano) → ROI Cropping & Preprocessing → CNN Backbone (ResNet-18/ViT) → Patch Embeddings

From embeddings, two parallel paths:

  • Anomaly Model (PatchCore/PaDiM) — detects unknown defects by comparing against a "normal" embedding distribution
  • Supervised Detector (YOLO) — classifies known defect types once enough labeled examples exist

Both outputs feed into a Deterministic PASS/FAIL Gate — a rule layer that produces a hard verdict. That verdict drives:

  • Actuator Output (GPIO/Modbus) for reject mechanisms
  • Local Evidence Buffer for frames, metadata, and operator dispositions

The evidence buffer syncs asynchronously to the Cloud Dashboard for model registry, retraining jobs, drift monitoring, and the advisory RCA assistant.

Key Design Principles

  • CNN Backbone → Embeddings: A pretrained CNN (ResNet, EfficientNet, or ViT) extracts mid-level features for each image patch. These embeddings capture textures and shapes relevant to defects.
  • Anomaly Model: Stores the "normal" distribution of embeddings. PatchCore uses a memory bank with nearest-neighbor scoring. PaDiM fits a Gaussian per patch position.
  • Supervised Detector: Optional YOLO model to classify common defects once enough labels exist. Complements the anomaly detector (which catches unknown defects).
  • Deterministic Gate: Takes the anomaly score and supervised outputs to produce PASS or FAIL. Enforces minimum anomaly area, temporal persistence, and SOP thresholds.
  • Edge Station: All inference and gating happen on the Jetson. No cloud in the loop. The PASS/FAIL signal goes directly to the PLC or reject mechanism.
  • Local Buffer: Raw frames and metrics are buffered on-device (48h default). They sync to cloud when connectivity is available.
  • Cloud Services: Model registry, retraining jobs, drift analysis, and reporting. The cloud does not make real-time decisions—it provides dashboards and orchestrates updates.

This architecture prioritizes edge-first intelligence and deterministic action.

Phase-by-Phase Buildout Plan (1–16 Weeks)

Phase 1 (Weeks 1–4): Anomaly Engine Deployment

  • Integrate camera stream and ROI cropping on the Jetson
  • Export CNN backbone (e.g. ResNet-18) to TensorRT for FP16 inference
  • Implement PatchCore (memory bank) and/or PaDiM (Gaussian) enrollment from good images
  • Compute anomaly heatmap and score each frame
  • Target: Inference latency <30ms on Jetson Orin Nano
  • Done when: On-device demo where known-good images yield PASS and injected defects yield FAIL, with heatmap visualization

Phase 2 (Weeks 4–6): Stability & Drift Monitoring

  • Log statistics: anomaly-score quantiles, feature centroid shifts, FP overrides, lighting levels
  • Compute a single "Stability Index" that triggers when the normal distribution shifts
  • Implement guided recalibration: prompt user to recapture normal data and rebuild baseline
  • Done when: Dashboard shows stability index; alert appears after significant scene change; recalibration workflow updates the model

Phase 3 (Weeks 6–10): Hybrid Learning Loop

  • Queue top anomalies for operator review (borderline and novel scores)
  • Operators choose: "Confirm Defect", "False Positive", or "Assign Defect Class"
  • Automatically label confirmed anomalies and add to defect classes
  • Schedule YOLO retraining when 20+ examples accumulate per class
  • Done when: New defect class can be labeled and trained in <1 day; updated YOLO model deployed to station after retrain

Phase 4 (Weeks 8–12): Deterministic Control & Metrics

  • Implement PASS/FAIL logic with configurable delay and pulse width
  • Expose GPIO or Modbus outputs to the PLC or reject system
  • Log action latencies (edge inference vs PLC response) and frame drops
  • Integrate TensorBoard/Grafana for latency KPIs
  • Target: End-to-end decision latency (camera→GPIO) <50ms; cold-start to decision <100ms
  • Done when: On FAIL, the hardware actuator triggers reliably at the configured timing; logs show latency metrics

Phase 5 (Weeks 12–14): Temporal Layer (Optional)

  • Maintain a sliding window of embedding statistics or scores
  • Detect periodicity breaks (using auto-correlation) or missing items
  • Done when: If production rhythm is broken (e.g. belt holds a bottle for >N seconds), generate an alert

Phase 6 (Weeks 10–14): Governance & Hardening

  • Implement a Model Registry: track dataset hash, model binary version, threshold config, deployed stations
  • Offline resilience: local buffer (48h), sync queue, and UI indicator
  • Explainability: show heatmaps, nearest-neighbor examples, top-3 normal patches
  • Done when: System passes internal security review; offline buffer performs under network cuts

Phase 7 (Weeks 14–16+): UX & Performance Polishing

  • Refine operator interface (single-key review, touchscreen ready)
  • Performance tune: quantify Jetson utilization, optimize TensorRT FP16 engines
  • Write automated deployment scripts and integration tests
  • Target: Operator review <5 seconds; CPU/GPU utilization <80% at target FPS
  • Done when: Operators can onboard a new station in <2 hours following the setup guide

Implementation Guidance: Jetson Orin Nano

Hardware Overview

The 8GB Jetson Orin Nano Super features a 1024-core Ampere GPU and delivers up to 67 TOPS of AI performance. In practice, it runs two medium CNNs (e.g. YOLO + ResNet-18) in parallel at real-time speed. YOLOv11n inference in TensorRT FP16 runs at ~4–10ms on Orin Nano. Combined with the patch model (~5ms), total pipeline latency stays <30ms.

Backbone Choices: A ResNet-18 (5.5M params) or EfficientNet-B0 (5.3M) are great starting points—few ms latency, proven transfer learning. For maximum pretraining, NVIDIA's self-supervised ViT (DINOv2) achieves state-of-art embeddings, but ViT→TensorRT on Jetson requires NVIDIA's TAO workflow. We recommend ResNet-18 for initial deployment.

TensorRT Export

  • ONNX Export: Use PyTorch's torch.onnx.export to dump the backbone (including intermediate patch outputs) to ONNX
  • TensorRT Build: Convert ONNX to a TRT engine on the Jetson (trtexec with --fp16). Ensure workspace size is large enough
  • Verification: Benchmark latency with trtexec. Expect ~5–10ms for the CNN backbone alone
  • Optimization Tips: Use INT8 calibration only if needed (many patch models degrade). FP16 usually suffices on Orin. Profile to avoid unnecessary ops

PatchCore vs PaDiM vs EfficientAD

  • PatchCore: Highest accuracy (~99.6% AUROC on MVTec), handles most anomalies with no supervised labels. Requires storing patch descriptors (manage with coreset sampling). Implementation: use Anomalib for reference pipeline
  • PaDiM: Gaussian per-patch is faster and low-memory. Good if Jetson RAM is limited. ~99% performance on benchmarks
  • EfficientAD (Teacher-Student): Fastest inference (millisecond-level), but adds training complexity. Consider in later phases if latency is critical

Model Governance & Drift Handling

Drift Detection Metrics

  • Feature Shift: Distance between incoming embedding centroids and baseline centroids
  • Score Drift: Rolling percentile of anomaly scores (p95 drift signals issues)
  • False-Positive Rate: Count FP overrides in reviews (rising FPR suggests drift)
  • Lighting/Color: Mean luminance and color temperature changes

Stability Score & Alerts

Combine metrics into a Stability Score. If it exceeds a threshold, flag the station as needing attention.

Guided Recalibration

When drift is detected:

  • Show a prompt or email alert to the technician
  • Guide them to capture ~10 minutes of current normal production via the UI
  • Automatically rebuild the patch model using both old + new normals (or incremental update)
  • Show comparison: histograms of anomaly scores before and after
  • Log the event in the registry (who, when, how much data)

The Hybrid Anomaly→Supervised Loop

IntelFactor uses human-in-the-loop to expand the model over time:

  • Review Queue: Show top-K anomalous crops for operator labeling. Only borderline cases are sent (uncertainty sampling)
  • Feedback Actions: For each flagged image, the operator can Confirm Defect (new positive label), False Alarm (add to normals), or Assign Defect Class (if known)
  • Data Accumulation: Collect images/patches of each defect class with labels until enough examples exist
  • Retraining: Schedule a cloud-based training job to fine-tune or train a YOLO model with new data
  • Deployment: Test and deploy new model artifacts via the registry

Operator UX and Setup Checklist

Review UX

A minimal interface with large PASS/FAIL indicators. Review mode shows 3–5 "suspect" thumbnails with hotkeys (P=pass, F=fail) and quick tagging. Target: <5 seconds per review decision.

Station Setup Checklist

  • Camera Setup: Mount camera, verify focus and lighting
  • ROI Selection: Draw region(s) covering product. Mask irrelevant areas (background, conveyor)
  • Baseline Capture: Collect 50–200 normal images under current conditions
  • Initial Enrollment: Build baseline model and set a conservative anomaly threshold
  • Signal Test: Run a known defect through; ensure FAIL is triggered
  • Drift Safeguards: Enable stability monitoring. Confirm offline buffer works
  • Operator Training: Show staff how to interpret PASS/FAIL and use the review page

Enterprise Hardening

  • Audit Trail: All decisions (scores, user overrides, deployments) are logged with timestamps and user IDs
  • Version Control: The model registry logs which data and code made each model
  • Offline First: The system never "turns off" if cloud is unreachable
  • Security & Privacy: Camera feed stays local. Data at rest is encrypted

Conclusion

IntelFactor is building the Datadog for Vision QC. By focusing on product-level anomalies, enforcing deterministic edge decisions, and enabling continuous human-in-the-loop learning, it addresses key pain points that process-only monitoring does not.

The ultimate goal: A deterministic QC station that learns from your line, not another disconnected dashboard.


Book a Demo to see how edge-first inspection works for your production line.