Self-Learning Loop

ONDA Self-Learning Loop

The self-learning loop is how Plugy improves your bot over time. Each iteration follows a fixed pipeline: benchmark, analyze, mutate, re-benchmark, safety check, approve or revert.

Iteration Pipeline

Step 1 — Baseline Benchmark

Run test dialogs against the current bot configuration. To reduce noise from AI nondeterminism, the benchmark runs multiple times and B-scores are averaged.

Step 2 — Identify Target

The weakest B-score component is selected as the optimization target.

Step 3 — Snapshot Current State

The current configuration is saved as a rollback point.

Step 4 — Propose Improvement

An AI model analyzes the current B-scores and proposes a targeted improvement (e.g., adjusting the persona text to improve empathy).

Step 5 — Safety Check

Critical invariants are validated:

Operator escalation rules must remain intact
Knowledge base structure must be preserved
Configuration parameters must stay within valid ranges

Step 6 — Apply and Re-Benchmark

The proposed change is applied and the benchmark runs again to measure the impact.

Step 7 — Guardian Gate

The Guardian system determines whether the improvement is safe to keep:

Quality must not decrease overall
Minimum quality thresholds must be maintained
No individual component can drop below a floor value

Step 8 — Approve or Revert

Approved: The improvement is kept and the bot immediately benefits.

Rejected: The configuration is reverted to the snapshot from Step 3. After 3 consecutive rejections, the system pauses automatically.

Convergence Detection

The system monitors improvement over a sliding window. When improvements become negligible, the current optimization level is considered converged and the system advances to the next level.

Benchmark Rotation

To prevent overfitting to static test dialogs:

A core set of golden test dialogs is always included
Recent real production conversations are periodically added
A holdout set is reserved for overfitting detection

Monitoring

Track self-learning progress in the dashboard:

B-score trends (before/after each iteration)
Iteration outcomes (approved vs rejected)
Current ONDA level
Guardian decisions and reasoning

See Monitoring & Analytics for more details.