Overview
A minimal customer-support RAG backend with production-minded guardrails: strict grounding, fail-fast refusal when retrieval has zero hits, Redis hot-cache with smart TTL, and dev/prod profiles for realistic deployment.
Project Background
Customer support queries in music streaming apps (e.g., membership renewal, pricing) are repetitive and high-volume.
This project demonstrates a minimal RAG pipeline with production-minded engineering choices:
- Predictable control flow: synchronous end-to-end pipeline for stability
- Hallucination containment: refusal gate when retrieval has no hit
- Performance optimization: Redis caching for hot queries (with TTL and graceful degradation)
- Environment separation: H2 (dev) vs MySQL + Redis (prod simulation) via Spring Profiles
- Vendor-agnostic LLM integration: OpenAI-compatible protocol (DashScope/Qwen by default)
Key Features
Strict Grounding Policy (Fail-Fast)
- Hit = 0: return fixed refusal immediately (no LLM call)
- Hit > 0: inject "Known Info" and answer only based on retrieved context
Dual-Profile Support (Dev vs Prod Simulation)
- dev (default): H2 in-memory, zero infrastructure required
- prod: MySQL persistence + Redis caching (Docker Compose), closer to real-world deployment
Redis Caching (Hot Query Optimization)
- Cache-First Strategy: Checks Redis before triggering retrieval or LLM inference to reduce latency and token costs.
- Smart TTL:
- Standard Answer: Long TTL (e.g., 10 min) for high cache hit rate.
- Refusal (Hits=0): Short TTL (e.g., 30s) to prevent "stale refusals" after KnowledgeBase updates.
- Stability: Redis failures are logged as warnings; the ...tically falls back to DB+LLM without breaking the user request.
Minimal Retrieval Baseline (Top-K)
- Top-K lexical retrieval (K=5) over KnowledgeBase (Spring Data JPA)
- Optional query normalization + retry to improve recall on noisy inputs
Architecture
Data Flow (Fail-Fast + Cache + RAG)
- Input normalization (trim / simple cleanup)
- Redis cache lookup (hot query optimization)
- Top-K retrieval from KnowledgeBase (K=5)
- Refusal gate: if
hits == 0, return refusal (no LLM) - Prompt assembly: inject Known Info
- LLM inference (DashScope OpenAI-compatible endpoint)
- Write-back to Redis with TTL (Short TTL for refusals to avoid stale refusals)
Swagger UI (endpoint visible)
Example API response (answer + hits)
Cache proof logs (MISS → LLM CALL → WRITE, then HIT with no LLM)
Tech Stack
Prompt Policy
| Component | Choice | Description |
|---|---|---|
| Language | Java 17 | Core development language |
| Framework | Spring Boot | Web MVC and dependency injection |
| ORM | Spring Data JPA | Repository abstraction over DB |
| Database (dev) | H2 | Zero-infra rapid development |
| Database (prod) | MySQL 8 | Persistence for production simulation |
| Cache (prod) | Redis 7 | Hot query caching with TTL |
| LLM Integration | OkHttp + Jackson | OpenAI-compatible chat completion client |
| API Docs | OpenAPI / Swagger UI | API exploration and testing |
| Deployment | Docker Compose | One-command infra startup |
The system uses a rigid template to prevent the LLM from using external knowledge.
[System Role]
Persona: NetEase Cloud Music customer support agent
Constraint: Answer ONLY using the provided "Known Info".
Failure Case: If the info is insufficient, reply exactly:
"抱歉,小云暂时还没学会这个问题"
No fabrication allowed.
[User Role]
Known Info:
[1] <retrieved_answer_1>
[2] <retrieved_answer_2>
...
User Question: <question>
AI-Assisted Development (Vibe Coding)
This project was developed with AI assistance using Cursor(model: GPT-5.2), utilizing a "Human-in-the-Loop" workflow:
- Scaffolding & Drafting: Rapid generation of Spring Boot boilerplate and configuration wiring.
- Documentation & Visualization: Iterative refinement of the README and Mermaid architecture diagrams.
- Debugging Support: Analyzing stack traces and resolving dependency conflicts.
Verification:
All AI-assisted changes were manually reviewed and adjusted. Key engineering patterns (cache degradation strategies and dev/prod profile isolation)
were validated through reproducible drills (cache hit/miss logs, Redis-down degradation drill).