Self-Learning Loop
ONDA Self-Learning Loop
Section titled “ONDA Self-Learning Loop”The self-learning loop is how Plugy improves your bot over time. Each iteration follows a fixed pipeline: benchmark, analyze, mutate, re-benchmark, safety check, approve or revert.
Iteration Pipeline
Section titled “Iteration Pipeline”Step 1 — Baseline Benchmark
Section titled “Step 1 — Baseline Benchmark”Run test dialogs against the current bot configuration. To reduce noise from AI nondeterminism, the benchmark runs multiple times and B-scores are averaged.
Step 2 — Identify Target
Section titled “Step 2 — Identify Target”The weakest B-score component is selected as the optimization target.
Step 3 — Snapshot Current State
Section titled “Step 3 — Snapshot Current State”The current configuration is saved as a rollback point.
Step 4 — Propose Improvement
Section titled “Step 4 — Propose Improvement”An AI model analyzes the current B-scores and proposes a targeted improvement (e.g., adjusting the persona text to improve empathy).
Step 5 — Safety Check
Section titled “Step 5 — Safety Check”Critical invariants are validated:
- Operator escalation rules must remain intact
- Knowledge base structure must be preserved
- Configuration parameters must stay within valid ranges
Step 6 — Apply and Re-Benchmark
Section titled “Step 6 — Apply and Re-Benchmark”The proposed change is applied and the benchmark runs again to measure the impact.
Step 7 — Guardian Gate
Section titled “Step 7 — Guardian Gate”The Guardian system determines whether the improvement is safe to keep:
- Quality must not decrease overall
- Minimum quality thresholds must be maintained
- No individual component can drop below a floor value
Step 8 — Approve or Revert
Section titled “Step 8 — Approve or Revert”Approved: The improvement is kept and the bot immediately benefits.
Rejected: The configuration is reverted to the snapshot from Step 3. After 3 consecutive rejections, the system pauses automatically.
Convergence Detection
Section titled “Convergence Detection”The system monitors improvement over a sliding window. When improvements become negligible, the current optimization level is considered converged and the system advances to the next level.
Benchmark Rotation
Section titled “Benchmark Rotation”To prevent overfitting to static test dialogs:
- A core set of golden test dialogs is always included
- Recent real production conversations are periodically added
- A holdout set is reserved for overfitting detection
Monitoring
Section titled “Monitoring”Track self-learning progress in the dashboard:
- B-score trends (before/after each iteration)
- Iteration outcomes (approved vs rejected)
- Current ONDA level
- Guardian decisions and reasoning
See Monitoring & Analytics for more details.