ai-test-gen

AI-Powered Test Case Generation Framework

92%

Time Saved

$0.002

Cost/TC

870

Test Cases

73% first-pass

Quality

Overview

ai-test-gen is a multi-agent LLM orchestration system for test generation. It reads structured requirements (Azure DevOps / Jira), passes them through a hybrid rule-engine + LLM pipeline, enforces acceptance-criteria coverage with automated feedback loops, and exports deterministic, structured test suites ready for import into ADO Test Plans.

Instead of a single prompt, the system decomposes work into specialised stages: ingestion and NLP parsing, deterministic rule-based generation, RAG-powered semantic matching with ChromaDB, LLM correction with JSON-schema enforcement, coverage validation, and finally multi-format export (CSV, JSON, Playwright scripts).

In an MSc research study on a production CAD application (55 user stories, 870 manual tests), ai-test-gen generated 870 structured test cases with a 92% reduction in authoring time, 94.4% acceptance-criteria coverage, and a first-pass structural quality of 72.9% at a total LLM cost of $1.74 (~$0.002 per test case).

System Architecture

01

Ingestion & Parsing — Adapters pull stories from Azure DevOps/Jira and normalise them into domain models. spaCy-based NLP extracts acceptance criteria, UI surfaces, and feature types.

02

Deterministic Generation — A rule engine with 70+ QA rules expands scenarios, generates structural scaffolds (PRE-REQ, launch, close, negative paths), and guarantees minimal quality without any LLM calls.

03

RAG: Semantic Matching — ChromaDB stores previous steps as embeddings. For new stories, semantically similar steps are retrieved as few-shot context to enforce consistent language and patterns.

04

LLM Correction — A provider-agnostic LLM layer (OpenAI / Gemini / Anthropic / Ollama) refines wording, fills edge cases, and produces JSON-structured output that matches a strict schema.

05

Validation & Feedback — Coverage validators check that every acceptance criterion is represented. Gaps trigger targeted LLM calls to generate missing tests; quality gates enforce structure, forbidden-language rules, and accessibility requirements.

06

Export & Integration — Final suites are exported to ADO-compatible CSVs, JSON, and Playwright scripts, with workflows to upload directly into Azure DevOps Test Plans and other tooling.

Why This Architecture?

A single LLM prompt can hallucinate steps, miss edge cases, and drift in wording between runs. ai-test-gen instead pushes as much as possible into deterministic rules, then uses LLMs only where they add real value — language quality, gap filling, and semantic alignment.

  • Hybrid rules + LLM keeps 70% of logic deterministic, reducing hallucination and giving predictable structure across projects.
  • RAG with ChromaDB reuses high-quality reference steps so new stories read like they were written by the same senior QA engineer.
  • Coverage validation loops ensure every acceptance criterion is covered at least once, turning ACs into an explicit quality contract.

Quick Start (Local)

  1. Clone the repo: git clone https://github.com/Gulzhasm/ai_test_gen.git
  2. Create a Python 3.10 venv and install deps: pip install -r requirements.txt
  3. Configure .env with ADO + LLM keys.
  4. Run your first generation: python workflows.py generate --story-id 123456

Full Docker flow, CLI reference, and MCP integration are documented in the project README on GitHub.

Tech Stack

PythonGemini 2.5 FlashChromaDBspaCyAzure DevOps APIClean ArchitectureDockerMCP Serverpython-docx