ai-test-gen
AI-Powered Test Case Generation Framework
92%
Time Saved
$0.002
Cost/TC
870
Test Cases
73% first-pass
Quality
Overview
ai-test-gen is a multi-agent LLM orchestration system for test generation. It reads structured requirements (Azure DevOps / Jira), passes them through a hybrid rule-engine + LLM pipeline, enforces acceptance-criteria coverage with automated feedback loops, and exports deterministic, structured test suites ready for import into ADO Test Plans.
Instead of a single prompt, the system decomposes work into specialised stages: ingestion and NLP parsing, deterministic rule-based generation, RAG-powered semantic matching with ChromaDB, LLM correction with JSON-schema enforcement, coverage validation, and finally multi-format export (CSV, JSON, Playwright scripts).
In an MSc research study on a production CAD application (55 user stories, 870 manual tests), ai-test-gen generated 870 structured test cases with a 92% reduction in authoring time, 94.4% acceptance-criteria coverage, and a first-pass structural quality of 72.9% at a total LLM cost of $1.74 (~$0.002 per test case).
System Architecture
Ingestion & Parsing — Adapters pull stories from Azure DevOps/Jira and normalise them into domain models. spaCy-based NLP extracts acceptance criteria, UI surfaces, and feature types.
Deterministic Generation — A rule engine with 70+ QA rules expands scenarios, generates structural scaffolds (PRE-REQ, launch, close, negative paths), and guarantees minimal quality without any LLM calls.
RAG: Semantic Matching — ChromaDB stores previous steps as embeddings. For new stories, semantically similar steps are retrieved as few-shot context to enforce consistent language and patterns.
LLM Correction — A provider-agnostic LLM layer (OpenAI / Gemini / Anthropic / Ollama) refines wording, fills edge cases, and produces JSON-structured output that matches a strict schema.
Validation & Feedback — Coverage validators check that every acceptance criterion is represented. Gaps trigger targeted LLM calls to generate missing tests; quality gates enforce structure, forbidden-language rules, and accessibility requirements.
Export & Integration — Final suites are exported to ADO-compatible CSVs, JSON, and Playwright scripts, with workflows to upload directly into Azure DevOps Test Plans and other tooling.
Why This Architecture?
A single LLM prompt can hallucinate steps, miss edge cases, and drift in wording between runs. ai-test-gen instead pushes as much as possible into deterministic rules, then uses LLMs only where they add real value — language quality, gap filling, and semantic alignment.
- Hybrid rules + LLM keeps 70% of logic deterministic, reducing hallucination and giving predictable structure across projects.
- RAG with ChromaDB reuses high-quality reference steps so new stories read like they were written by the same senior QA engineer.
- Coverage validation loops ensure every acceptance criterion is covered at least once, turning ACs into an explicit quality contract.
Quick Start (Local)
- Clone the repo:
git clone https://github.com/Gulzhasm/ai_test_gen.git - Create a Python 3.10 venv and install deps:
pip install -r requirements.txt - Configure
.envwith ADO + LLM keys. - Run your first generation:
python workflows.py generate --story-id 123456
Full Docker flow, CLI reference, and MCP integration are documented in the project README on GitHub.