Phase 2: RAG App Development & Heterogeneous System Architecture (Level 2)
Cycle: Weeks 3-5 Core Goal: Master Retrieval-Augmented Generation (RAG), understand Java-Python heterogeneous system communication, establish evaluation system
Prerequisite Capabilitiesβ
- β Mastered Python basics and AI tool usage (Level 1)
- β Established AI-Native development habits
- β Deployed local Ollama and Vector Database
Why this phase is neededβ
RAG is core capability of AI applications. Agent's intelligent decision depends on knowledge retrieval capability provided by RAG. Must first master "How to let AI acquire external knowledge" before entering "How to let AI decide autonomously".
β Core Capability 1: Dual-Track RAG Implementationβ
β Platform Stream: Dify Enterprise Practice (80% Scenarios)β
Strategy: "Extend, Don't Deep Dive"
Treat Dify as "Black Box" or "Component", focus on how to Extend it:
- Custom Tooling: Write Python API for Dify to call, letting Java system communicate with Dify
- Workflow Orchestration: Master Dify's Workflow nodes, understand how to pass variables between nodes
- Why skip source code? Dify is based on Flask/Next.js/PostgreSQL complex stack, unless fixing bug, time should be spent on LangGraph (Complex Logic) or Milvus (Data Side)
β Practical Task:
- Deploy Dify using Docker
- Connect Local Ollama
- β Build Enterprise Knowledge Base: Upload PDF, configure Embedding model, test hybrid search
- Publish Web Chatbot
β Core Practice: Dify Custom Toolβ
Use your Spring Boot advantage to enhance Agent's hands and feet.
Task Descriptionβ
Dify cannot directly access your company's real-time data (e.g. server status, database inventory), you need to write an API for it to call.
Implementation Stepsβ
- Backend Development (Spring Boot):
- Write an interface
GET /api/v1/system/status, returning CPU/Memory/Business data in JSON format. - Generate OpenAPI (Swagger) description document.
- Write an interface
- Dify Integration:
- In Dify "Tools" tab -> "Create Custom Tool".
- Import your OpenAPI Schema.
- Agent Orchestration:
- Enable the tool in Prompt.
- Test question: "How is system load now?" -> Agent automatically calls Java interface -> Answer.
Code Stream: Handwritten RAG Link (20% Deep Customization)β
Core Components:
- β Milvus: Deploy and understand HNSW index principle
- β Embedding: Use
sentence-transformersto loadbge-base-zh
β Advanced Chunking Strategy:
-
β Parent-Child Indexing
- Index small chunk (128 tokens, precise retrieval)
- Return parent chunk (512 tokens, retain context)
- Solve "Precise retrieval but insufficient context" problem
-
HyDE (Hypothetical Document Embeddings)
- Let LLM generate "Hypothetical Answer" first
- Use hypothetical answer's Embedding to search documents
- Solve "Question and document semantics mismatch" problem
β Practical Code Link:
Read PDF β Advanced Chunking β Vectorization β Store in Milvus β Retrieval β Assemble Prompt β LLM Answer
β Early Eval (Early Evaluation) - Crucial!β
β Core Concept: Treat RAG evaluation as Unit Test
β Golden Dataset Construction Method:
-
Select Representative Questions (10-20):
- Cover different difficulties: Simple fact query, complex reasoning, multi-hop questions
- Cover different topics: Ensure coverage of various fields of knowledge base
- Include edge cases: Questions with no answer in knowledge base
-
Write Standard Answers:
- Annotate answer source (which paragraph of which document)
- Annotate key information points (for evaluating recall rate)
-
β Automated Evaluation Script:
# Example: RAG Evaluation Script
from ragas import evaluate
from ragas.metrics import faithfulness, answer_relevancy
# Golden Dataset
golden_qa = [
{"question": "How to optimize RAG retrieval?", "answer": "Use Parent-Child indexing", "context": "..."},
# ... More QA pairs
]
# Run Evaluation
results = evaluate(golden_qa, metrics=[faithfulness, answer_relevancy])
print(f"Faithfulness: {results['faithfulness']:.2f}")
print(f"Answer Relevancy: {results['answer_relevancy']:.2f}")
# If score < 0.7, RAG link needs optimization
assert results['faithfulness'] > 0.7, "RAG Faithfulness Insufficient"
Integrate to CI/CD (Implement Evaluation Shift Left):
Goal: Treat RAG evaluation as Unit Test, automatically verify quality on every modification.
Implementation Steps:
- Create Evaluation Script: Save above RAG evaluation code to
tests/test_rag_quality.py - β Set Quality Threshold: Define minimum acceptable score (Faithfulness > 0.7, Answer Relevancy > 0.7)
- Integrate CI Pipeline: Add evaluation step in GitHub Actions / GitLab CI
CI Config Example (GitHub Actions):
# .github/workflows/rag-quality.yml
name: RAG Quality Check
on: [push, pull_request]
jobs:
evaluate:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v2
- name: Run RAG Evaluation
run: |
python tests/test_rag_quality.py
# If score below threshold, CI fails
Use Ragas or TruLens:
- β Ragas: More suitable for RAG systems, provides metrics like Faithfulness, Answer Relevancy
- TruLens: More suitable for complex Agents, provides complete Trace analysis
Effect Verification: After modifying Prompt, CI automatically runs evaluation. If Faithfulness drops more than 5%, automatically block merge.
β Core Capability 2: Heterogeneous System Communication Protocol (Java β Python)β
Core Challengesβ
- AI Inference slow (Python), Business Logic fast (Java) β Need async decoupling
- β Java needs deterministic JSON object, LLM output unstable β Need structured output validation
- Cross-language data consistency β Need unified data contract
Architecture Design: Three-Layer Solutionβ
β 1. Structured Output Validation
Python Side:
from pydantic import BaseModel
from instructor import patch
class UserInfo(BaseModel):
user_id: str
email: str
profile: dict
# Force LLM output to conform to Pydantic model
client = patch(OpenAI())
response = client.chat.completions.create(
model="gpt-4",
response_model=UserInfo,
messages=[{"role": "user", "content": "Extract user info from: ..."}]
)
# response guaranteed to be UserInfo type
Java Side:
// Use OpenAPI/Swagger to generate corresponding DTO
@Data
public class UserInfo {
private String userId;
private String email;
private Map<String, Object> profile;
}
β 2. Async Communication Mode (Sidecar Pattern)
Architecture Diagram:
User Request β Java Backend (Business Logic) β Return "Processing"
β
Message Queue (RabbitMQ)
β
Python Service (AI Inference) β Processing Complete
β
Callback Java (Update Status)
Practical Scenario:
- User uploads PDF, Java backend directly returns "Processing"
- Notify Python service via MQ to perform chunking, vectorization
- Callback Java to update status after completion
3. Data Contract Definition (OpenAPI/Swagger)
# shared-api-spec.yaml
components:
schemas:
RAGRequest:
type: object
properties:
query:
type: string
top_k:
type: integer
default: 5
RAGResponse:
type: object
properties:
answer:
type: string
sources:
type: array
items:
type: string
confidence:
type: number
Java Side uses OpenAPI Generator to generate Feign Client:
@FeignClient(name = "rag-service", url = "${rag.service.url}")
public interface RAGServiceClient {
@PostMapping("/query")
RAGResponse query(@RequestBody RAGRequest request);
}
Hybrid Search Strategyβ
Combine Full Text Search (Elasticsearch/MySQL) with Vector Search (Milvus):
# Hybrid Search Example
def hybrid_search(query: str, top_k: int = 5):
# 1. Vector Search (Semantic Similarity)
vector_results = milvus_client.search(query_embedding, top_k=10)
# 2. Full Text Search (Keyword Match)
keyword_results = elasticsearch.search(query, top_k=10)
# 3. Fuse Results (Reciprocal Rank Fusion)
final_results = rrf_fusion(vector_results, keyword_results, top_k=top_k)
return final_results
Core Capability 3: Dual Database Architecture Designβ
Architecture: MySQL (Business Data) + Vector DB (Semantic Data)
Data Sync Consistency Issue:
// Java Side: Sync to vector DB when creating user
@Transactional
public void createUser(User user) {
// 1. Save to MySQL
userRepository.save(user);
// 2. Send event to MQ (Async sync to vector DB)
rabbitTemplate.convertAndSend("user.created", new UserCreatedEvent(user));
}
# Python Side: Listen to MQ, sync to vector DB
@mq_consumer("user.created")
def sync_user_to_vector_db(event: UserCreatedEvent):
embedding = embed_model.encode(event.user.profile)
milvus_client.insert({
"user_id": event.user.id,
"embedding": embedding,
"metadata": event.user.to_dict()
})
Phase Output Standardsβ
Deliverables Must Complete (As prerequisites for entering Level 3):
RAG Application Layer:
- Configured Dify deployment and published at least 1 Knowledge Base app, able to answer domain questions
- Dify Custom Tool
- β Handwritten complete RAG link code, containing: PDF Parsing β Chunking β Vectorization β Milvus Storage β Retrieval β LLM Generation
- β Implement at least 1 advanced chunking strategy (Parent-Child or HyDE), and verify effect through contrast experiment
Evaluation System Layer:
- β Establish Golden Dataset (10-20 QA pairs), containing questions of different difficulties and topics
- β Implement automated evaluation script, integrate Ragas or TruLens
- Quantitative Metrics: Test using Golden Dataset, recall rate improved by at least 20% from baseline (e.g. from 0.60 to 0.72), Faithfulness > 0.7
Heterogeneous System Architecture Layer:
- β Design and implement Java-Python async communication architecture (Using RabbitMQ or Kafka)
- Implement Hybrid Search strategy (Vector + Full Text), and verify effect improvement through A/B testing
- Define unified OpenAPI data contract, Java side able to call Python service via Feign Client
Capability Verification:
- β Able to explain "Why Parent-Child Indexing is better than ordinary chunking" and give data support
- Able to design a complete dual database architecture (MySQL + Milvus), and solve data sync consistency issue
Time Checkpoint: If not completed after 3 weeks, suggest using Dify to complete basic RAG first, then gradually add advanced features
Roadmap Optimization Suggestionsβ
Practical Project: Build "Search Switcher" in Java, automatically decide based on question type:
- Use Database Query (Structured Data, e.g. "Query user orders")
- Use RAG Query (Unstructured Knowledge, e.g. "How to optimize system performance")
Previous Phase: Level 1 - AI-Native Workflow & Infrastructure
Next Phase: Level 3 - Agent Architecture & Observability