NetEase Cloud Music CS Agent
(Minimal RAG)

Chen Hongshan

Spring Boot · RAG · Redis Cache · Guardrails

Demo of the project

Overview

A minimal customer-support RAG backend with production-minded guardrails: strict grounding, fail-fast refusal when retrieval has zero hits, Redis hot-cache with smart TTL, and dev/prod profiles for realistic deployment.

Project Background

Customer support queries in music streaming apps (e.g., membership renewal, pricing) are repetitive and high-volume.

This project demonstrates a minimal RAG pipeline with production-minded engineering choices:

Predictable control flow: synchronous end-to-end pipeline for stability
Hallucination containment: refusal gate when retrieval has no hit
Performance optimization: Redis caching for hot queries (with TTL and graceful degradation)
Environment separation: H2 (dev) vs MySQL + Redis (prod simulation) via Spring Profiles
Vendor-agnostic LLM integration: OpenAI-compatible protocol (DashScope/Qwen by default)

Key Features

Strict Grounding Policy (Fail-Fast)

Hit = 0: return fixed refusal immediately (no LLM call)
Hit > 0: inject "Known Info" and answer only based on retrieved context

Dual-Profile Support (Dev vs Prod Simulation)

dev (default): H2 in-memory, zero infrastructure required
prod: MySQL persistence + Redis caching (Docker Compose), closer to real-world deployment

Redis Caching (Hot Query Optimization)

Cache-First Strategy: Checks Redis before triggering retrieval or LLM inference to reduce latency and token costs.
Smart TTL:
- Standard Answer: Long TTL (e.g., 10 min) for high cache hit rate.
- Refusal (Hits=0): Short TTL (e.g., 30s) to prevent "stale refusals" after KnowledgeBase updates.
Stability: Redis failures are logged as warnings; the ...tically falls back to DB+LLM without breaking the user request.

Minimal Retrieval Baseline (Top-K)

Top-K lexical retrieval (K=5) over KnowledgeBase (Spring Data JPA)
Optional query normalization + retry to improve recall on noisy inputs

Architecture

Data Flow (Fail-Fast + Cache + RAG)

Input normalization (trim / simple cleanup)
Redis cache lookup (hot query optimization)
Top-K retrieval from KnowledgeBase (K=5)
Refusal gate: if hits == 0, return refusal (no LLM)
Prompt assembly: inject Known Info
LLM inference (DashScope OpenAI-compatible endpoint)
Write-back to Redis with TTL (Short TTL for refusals to avoid stale refusals)

Swagger UI (endpoint visible)

Example API response (answer + hits)

Cache proof logs (MISS → LLM CALL → WRITE, then HIT with no LLM)

Tech Stack

Prompt Policy

Component	Choice	Description
Language	Java 17	Core development language
Framework	Spring Boot	Web MVC and dependency injection
ORM	Spring Data JPA	Repository abstraction over DB
Database (dev)	H2	Zero-infra rapid development
Database (prod)	MySQL 8	Persistence for production simulation
Cache (prod)	Redis 7	Hot query caching with TTL
LLM Integration	OkHttp + Jackson	OpenAI-compatible chat completion client
API Docs	OpenAPI / Swagger UI	API exploration and testing
Deployment	Docker Compose	One-command infra startup

The system uses a rigid template to prevent the LLM from using external knowledge.

[System Role]
Persona: NetEase Cloud Music customer support agent
Constraint: Answer ONLY using the provided "Known Info".
Failure Case: If the info is insufficient, reply exactly:
"抱歉，小云暂时还没学会这个问题"
No fabrication allowed.

[User Role]
Known Info:
[1] <retrieved_answer_1>
[2] <retrieved_answer_2>
...
User Question: <question>

AI-Assisted Development (Vibe Coding)

This project was developed with AI assistance using Cursor(model: GPT-5.2), utilizing a "Human-in-the-Loop" workflow:

Scaffolding & Drafting: Rapid generation of Spring Boot boilerplate and configuration wiring.
Documentation & Visualization: Iterative refinement of the README and Mermaid architecture diagrams.
Debugging Support: Analyzing stack traces and resolving dependency conflicts.

Verification:

All AI-assisted changes were manually reviewed and adjusted. Key engineering patterns (cache degradation strategies and dev/prod profile isolation) were validated through reproducible drills (cache hit/miss logs, Redis-down degradation drill).

NetEase Cloud Music CS Agent (Minimal RAG)