Skip to content

Self-Learning Loop

The self-learning loop is how Plugy improves your bot over time. Each iteration follows a fixed pipeline: benchmark, analyze, mutate, re-benchmark, safety check, approve or revert.

Run test dialogs against the current bot configuration. To reduce noise from AI nondeterminism, the benchmark runs multiple times and B-scores are averaged.

The weakest B-score component is selected as the optimization target.

The current configuration is saved as a rollback point.

An AI model analyzes the current B-scores and proposes a targeted improvement (e.g., adjusting the persona text to improve empathy).

Critical invariants are validated:

  • Operator escalation rules must remain intact
  • Knowledge base structure must be preserved
  • Configuration parameters must stay within valid ranges

The proposed change is applied and the benchmark runs again to measure the impact.

The Guardian system determines whether the improvement is safe to keep:

  1. Quality must not decrease overall
  2. Minimum quality thresholds must be maintained
  3. No individual component can drop below a floor value

Approved: The improvement is kept and the bot immediately benefits.

Rejected: The configuration is reverted to the snapshot from Step 3. After 3 consecutive rejections, the system pauses automatically.

The system monitors improvement over a sliding window. When improvements become negligible, the current optimization level is considered converged and the system advances to the next level.

To prevent overfitting to static test dialogs:

  • A core set of golden test dialogs is always included
  • Recent real production conversations are periodically added
  • A holdout set is reserved for overfitting detection

Track self-learning progress in the dashboard:

  • B-score trends (before/after each iteration)
  • Iteration outcomes (approved vs rejected)
  • Current ONDA level
  • Guardian decisions and reasoning

See Monitoring & Analytics for more details.