NetEase Cloud Music CS Agent
(Minimal RAG)

Spring Boot · RAG · Redis Cache · Guardrails

Demo of the project

Overview

A minimal customer-support RAG backend with production-minded guardrails: strict grounding, fail-fast refusal when retrieval has zero hits, Redis hot-cache with smart TTL, and dev/prod profiles for realistic deployment.

Project Background

Customer support queries in music streaming apps (e.g., membership renewal, pricing) are repetitive and high-volume.

This project demonstrates a minimal RAG pipeline with production-minded engineering choices:

  • Predictable control flow: synchronous end-to-end pipeline for stability
  • Hallucination containment: refusal gate when retrieval has no hit
  • Performance optimization: Redis caching for hot queries (with TTL and graceful degradation)
  • Environment separation: H2 (dev) vs MySQL + Redis (prod simulation) via Spring Profiles
  • Vendor-agnostic LLM integration: OpenAI-compatible protocol (DashScope/Qwen by default)

Key Features

Strict Grounding Policy (Fail-Fast)

  • Hit = 0: return fixed refusal immediately (no LLM call)
  • Hit > 0: inject "Known Info" and answer only based on retrieved context

Dual-Profile Support (Dev vs Prod Simulation)

  • dev (default): H2 in-memory, zero infrastructure required
  • prod: MySQL persistence + Redis caching (Docker Compose), closer to real-world deployment

Redis Caching (Hot Query Optimization)

  • Cache-First Strategy: Checks Redis before triggering retrieval or LLM inference to reduce latency and token costs.
  • Smart TTL:
    • Standard Answer: Long TTL (e.g., 10 min) for high cache hit rate.
    • Refusal (Hits=0): Short TTL (e.g., 30s) to prevent "stale refusals" after KnowledgeBase updates.
  • Stability: Redis failures are logged as warnings; the ...tically falls back to DB+LLM without breaking the user request.

Minimal Retrieval Baseline (Top-K)

  • Top-K lexical retrieval (K=5) over KnowledgeBase (Spring Data JPA)
  • Optional query normalization + retry to improve recall on noisy inputs

Architecture

Data Flow (Fail-Fast + Cache + RAG)

  1. Input normalization (trim / simple cleanup)
  2. Redis cache lookup (hot query optimization)
  3. Top-K retrieval from KnowledgeBase (K=5)
  4. Refusal gate: if hits == 0, return refusal (no LLM)
  5. Prompt assembly: inject Known Info
  6. LLM inference (DashScope OpenAI-compatible endpoint)
  7. Write-back to Redis with TTL (Short TTL for refusals to avoid stale refusals)

Tech Stack

Prompt Policy

Component Choice Description
Language Java 17 Core development language
Framework Spring Boot Web MVC and dependency injection
ORM Spring Data JPA Repository abstraction over DB
Database (dev) H2 Zero-infra rapid development
Database (prod) MySQL 8 Persistence for production simulation
Cache (prod) Redis 7 Hot query caching with TTL
LLM Integration OkHttp + Jackson OpenAI-compatible chat completion client
API Docs OpenAPI / Swagger UI API exploration and testing
Deployment Docker Compose One-command infra startup

The system uses a rigid template to prevent the LLM from using external knowledge.

[System Role]
Persona: NetEase Cloud Music customer support agent
Constraint: Answer ONLY using the provided "Known Info".
Failure Case: If the info is insufficient, reply exactly:
"抱歉,小云暂时还没学会这个问题"
No fabrication allowed.

[User Role]
Known Info:
[1] <retrieved_answer_1>
[2] <retrieved_answer_2>
...
User Question: <question>

AI-Assisted Development (Vibe Coding)

This project was developed with AI assistance using Cursor(model: GPT-5.2), utilizing a "Human-in-the-Loop" workflow:

  • Scaffolding & Drafting: Rapid generation of Spring Boot boilerplate and configuration wiring.
  • Documentation & Visualization: Iterative refinement of the README and Mermaid architecture diagrams.
  • Debugging Support: Analyzing stack traces and resolving dependency conflicts.

Verification:

All AI-assisted changes were manually reviewed and adjusted. Key engineering patterns (cache degradation strategies and dev/prod profile isolation) were validated through reproducible drills (cache hit/miss logs, Redis-down degradation drill).