Complete Guide to ReAct-based Agent Design
π Scheme Overviewβ
This document provides a complete Agent Construction Scheme, combining LangGPT Structured Prompt + ReAct Reasoning Mode, used for handling complex tasks requiring multi-step reasoning, tool calling, and knowledge integration.
Core Concept: Let AI "Think first, then Act, then Observe, then Decide" like a human expert.
π― Applicable Scenariosβ
β Tasks requiring consultation of professional knowledge base + internet search β Complex queries requiring multi-step planning and dynamic decision-making β Teaching/Consulting scenarios requiring transparent reasoning process β Applications requiring tool orchestration (Retrieval, Search, Calculation, etc.)
Typical Application Scenarios:
- π English Teaching Assistant (Grammar + Pronunciation Dual Diagnosis)
- βοΈ Legal Advisor (Statute Retrieval + Case Search)
- π₯ Medical Consultation (Medical Knowledge Base + Latest Research)
- π§ Technical Support (Product Documentation + Community Solutions)
Part 1: Structured Prompt Designβ
Design Principlesβ
Based on LangGPT Standard Format, adopting five-element structure:
Role (Role Positioning) β Goals (Goals) β Constraints (Constraints) β Skills (Skills/Tools) β Workflow (Workflow)
Core Points:
- Forced ReAct Loop: [Thought] β [Action] β [Observation] β [Response]
- Tool Priority: Knowledge Base First β Internet Search Supplement β Multi-source Integration
- Reasoning Transparency: Show decision path to user
- Reject Hallucination: Inform honestly when there is no basis
Complete Prompt Templateβ
# Role
You are an AI Language Teaching Expert (Senior Linguistic Agent) proficient in "American Pronunciation Training" and "English Grammar".
You have access to "English Grammar in Use" and "American Accent Training" knowledge bases, and also possess internet search capability.
# Goals
1. **Precise Diagnosis**: Analyze grammar errors or pronunciation difficulties in user sentences
2. **Authoritative Citation**: Prioritize providing textbook-level explanations based on knowledge base
3. **Dynamic Supplement**: Use internet search tool when knowledge base cannot cover (such as latest slang)
4. **Teaching Loop**: Not only give answer, but also provide targeted practice suggestions
# Constraints
1. **ReAct Thinking Principle**: Must execute [Thought] β [Action] β [Observation] reasoning loop before answering
2. **Knowledge Priority**: Prioritize retrieving knowledge base, call search tool only when there are no results or relevance is low
3. **Format Standard**: Final output must be clearly stated in points using Markdown
4. **Reject Hallucination**: If neither knowledge base nor internet has reliable information, honestly answer "No authoritative basis found"
5. **Reasoning Transparency**: Show your search path and decision basis to user
# Skills (Available Tools)
1. **Knowledge_Retrieval**: Consult "English Grammar in Use" and "American Accent Training"
2. **Web_Search**: Use search engine to find latest language usage or supplementary example sentences
# Workflow (ReAct Loop)
## Phase 1: Thought - Analysis and Planning
**Before executing any operation, must think first:**
- What is user intent? (Grammar issue / Pronunciation issue / Vocabulary explanation)
- Which category does this knowledge point belong to? (Classic Grammar / Pronunciation Rules / Neologisms & Slang)
- Formulate action plan:
- Is it necessary to check knowledge base? (What are keywords?)
- Is it possible that knowledge base cannot cover? (Need to prepare search in advance)
- Need to combine multiple information sources?
## Phase 2: Action - Execute Tool Call
**Execute by priority:**
1. **Step 1**: Call `Knowledge_Retrieval` to search core keywords
2. **Step 2**: Observe retrieval results
- If relevance β₯ 0.7 and content is sufficient β Use directly
- If relevance < 0.6 or no result β Call `Web_Search` immediately
3. **Step 3**: Integrate multi-source information (Knowledge Base + Search Results)
## Phase 3: Observation - Result Evaluation
**Judge results of each tool call:**
- Is information sufficient? Need secondary retrieval?
- Are information from different sources contradictory? How to choose?
- Need to call other tools to supplement?
## Phase 4: Response - Structured Output
**Output final answer in following format:**
### π Analysis Path
Briefly describe your reasoning and search process (e.g.: "Consulted 'Grammar in Use' Unit 5, and verified latest usage through search...")
### β
Core Answer
- Give corrected correct sentence or pronunciation guidance
- Mark key modification points with bold/highlight
### π Deep Analysis
- Explain underlying grammar rules or pronunciation principles
- Cite knowledge base source (e.g. "Unit 12: Present Perfect")
- Contrast common wrong usages
### π£οΈ Practice Suggestions
- Provide 1-2 targeted practice sentences
- Give shadowing or rewriting tasks
### π Extended Resources (Optional)
- If high-quality materials are found in web search, provide links
# Initialization
I am ready. Please directly send sentence or question you want to analyze, I will start ReAct reasoning process to answer for you.
Reasoning Mode Settingsβ
In Dify Agent configuration, choose appropriate reasoning mode:
| Mode | Applicable Scenario | Advantage | Disadvantage |
|---|---|---|---|
| Function Calling | When model natively supports | Fast speed, high accuracy, low Token consumption | Relies on model capability |
| ReAct | General scenario | Reasoning process visible, easy to debug, high flexibility | High Token consumption |
Recommended Configuration:
- β Prioritize Function Calling (GPT-4, Claude 3.5, DeepSeek-V3)
- β If model does not support, downgrade to ReAct
- β Enable "Show Thinking Process" during development phase for easy debugging
Advanced Parameter Tuningβ
# Model Parameters
Temperature: 0.3 # Reduce randomness, improve accuracy
Max Tokens: 2000 # Ensure complete analysis can be output
Top P: 0.9 # Maintain output quality
Presence Penalty: 0.1 # Reduce repetitive phrasing
Frequency Penalty: 0.0 # Do not limit frequency of technical terms
# Retrieval Parameters
Top K: 3-5 # Return most relevant 3-5 fragments
Score Threshold: 0.6 # Relevance threshold
Rerank: true # Enable reranking (if supported)
# Search Parameters
Max Results: 5 # Max 5 search results
Timeout: 10s # Search timeout
Safe Search: moderate # Moderate safe search level
Final Output Exampleβ
User Input:
What's the difference between 'I live in Shanghai' and 'I am creating a Rizz app'?
π Analysis Pathβ
I consulted "English Grammar in Use" Unit 3 to confirm tense usage of live, and obtained latest definition of Rizz (Oxford Word of the Year 2023) through internet search.
β Core Answerβ
-
"I live in Shanghai" β Correct
liveis a stative verb, describing long-term residence status- Should not use continuous tense
I am living(unless emphasizing temporariness)
-
"I am creating a Rizz app" β Correct
createis an action verb, can use continuous tense to indicate ongoing development work
π Deep Analysisβ
Grammar Rules (Source: Unit 3):
- Stative Verbs: live, know, like, understand
- Describe state rather than action, usually not used in continuous tense
- Exception: Can use continuous tense when live emphasizes "temporary residence"
- "I'm living in a hotel this week." β
Vocabulary Analysis (Source: Web Search):
- Rizz /rΙͺz/
- Etymology: Abbreviation of Charisma (Liaison: Cha-ris-ma β Rizz)
- Meaning: Charm or skill to attract opposite sex
- Example: "He's got serious rizz!"
π£οΈ Practice Suggestionsβ
-
Tense Contrast Practice:
- I live in Beijing. (permanent)
- I 'm living in a hotel. (temporary)
-
Rizz Usage Practice: Describe your app in one sentence: "This app helps users improve their rizz through AI-powered conversation tips."
π Extended Resourcesβ
Part 2: Dify Agent Configuration Guideβ
1οΈβ£ Create Agent Applicationβ
Operation Path:
Dify Home β Create App β Select "Agent" β Basic Orchestration Mode
Notes:
- β Do not use "Chat Assistant" or "Workflow" mode (they do not support dynamic reasoning)
- β Agent mode automatically plans tool call sequence, no manual wiring needed
- β Supports changing decision mid-way (e.g., automatically switch to search after retrieval failure)
2οΈβ£ Model Selectionβ
| Recommended Model | Platform | Reasoning Capability | Cost | Function Calling |
|---|---|---|---|---|
| DeepSeek-V3 | SiliconFlow | βββββ | π° | β |
| Qwen2.5-72B | SiliconFlow | ββββ | π°π° | β |
| GPT-4o | OpenAI | βββββ | π°π°π° | β |
| Claude 3.5 Sonnet | Anthropic | βββββ | π°π°π° | β |
Configuration Requirements:
- β Must support Function Calling
- β Context window β₯ 32K (for processing long knowledge base retrieval results)
- β Strong reasoning capability (able to understand complex ReAct instructions)
3οΈβ£ Tool Configurationβ
Add in "Tools" section of Agent orchestration page:
π§ Tool 1: Knowledge Retrievalβ
Type: Knowledge Retrieval
Data Source: Select created "English Book" Knowledge Base
Retrieval Mode: Semantic Search
Top K: 3-5
Score Threshold: 0.6
Rerank: Enable (if supported)
Knowledge Base Preparation Suggestions:
- Split "English Grammar in Use" and "American Accent Training" into chapters and upload
- Add metadata to each document (e.g., Unit number, Topic tag)
- Use high-quality Embedding model (e.g., text-embedding-3-large)
π§ Tool 2: Web Searchβ
Type: Web Search
Recommended Plugin: DuckDuckGo / Tavily / SerpAPI
Max Results: 5
Timeout: 10 seconds
Language Preference: English
Safe Search: Moderate
Optional Tools:
- π΅ Speech Synthesis (Azure TTS / ElevenLabs): Read corrected sentences
- π Example Generation (GPT-4 Mini): Generate scenario-based practice sentences
- π¨ Chart Generation (Python Code Interpreter): Visualize grammar structure
4οΈβ£ Agent Mode Configurationβ
Find "Agent Mode" in model configuration area:
# Recommended Configuration
Agent Mode: Function Calling # Prioritize choice
Max Iterations: 10 # Max 10 reasoning rounds
Tool Choice: auto # Auto select tool
Parallel Tool Calls: false # Call sequentially (ensure clear logic)
Show Reasoning: true # Enable in development phase
Downgrade Scheme (If model does not support Function Calling):
Agent Mode: ReAct
Max Iterations: 8
Scratchpad: enabled # Enable scratchpad to record reasoning process
Part 3: Practical Case Analysisβ
Case 1: Classic Grammar + Neologism Mixed Queryβ
Input:
Is "I'm loving this Rizz app" correct?
ReAct Reasoning Process:
π Thought:
- "loving" is stative verb β Check knowledge base rules
- "Rizz" is buzzword β Need internet confirmation of meaning
- Plan: Check grammar book first, then search vocabulary
π¨ Action 1:
Tool: Knowledge_Retrieval
Query: "state verbs love continuous"
Result:
- Unit 4: "Love usually not used in continuous"
- Exception: Can be used in certain contexts (emphasize temporary feeling)
- Relevance: 0.92
ποΈ Observation 1:
Knowledge base gave rules and exceptions, but need to judge whether "loving this app" belongs to exception
π¨ Action 2:
Tool: Web_Search
Query: "I'm loving it grammar modern usage"
Result:
- McDonald's slogan makes this usage accepted in spoken language
- Grammarians think acceptable in advertising/spoken language
ποΈ Observation 2:
In modern English, "I'm loving" is widely accepted in informal occasions
Final Output:
π Analysis Pathβ
Retrieved rules about state verbs in "Grammar in Use" Unit 4, and searched actual usage in modern English.
β Core Answerβ
"I'm loving this Rizz app" in spoken language/social media β Acceptable In formal writing β οΈ Suggest changing to "I love this Rizz app"
π Deep Analysisβ
- Traditional Grammar: love is stative verb, in principle not used in continuous tense
- Modern Usage: Success of McDonald's "I'm lovin' it" popularized this usage in spoken language
- Context Judgment:
- Social Media/Advertising β Can use (emphasize immediate strong feeling)
- Academic Writing/Formal Document β Avoid use
π£οΈ Practice Suggestionsβ
Contrast following sentences to feel tone difference:
- I love your app. (Objective fact)
- I'm loving your app! (Strong emotion at this moment)
Case 2: Multi-round Reasoning + Tool Switchβ
Input:
How do you pronounce "schedule"? Is it different in American English?
ReAct Reasoning Process:
π Thought:
- Pronunciation issue β Prioritize checking "American Accent Training"
- Involves American-British difference β May need supplementary search
π¨ Action 1:
Tool: Knowledge_Retrieval
Query: "schedule pronunciation American"
Result:
- Chapter 7: "Schedule in American pronunciation is /ΛskedΚuΛl/"
- Relevance: 0.88
ποΈ Observation 1:
Knowledge base gave American pronunciation, but did not mention British pronunciation contrast
π¨ Action 2:
Tool: Web_Search
Query: "schedule pronunciation British vs American"
Result:
- British: /ΛΚedjuΛl/
- American: /ΛskedΚuΛl/
- History: Influence of spelling reform after American independence
ποΈ Observation 2:
Obtained complete contrast information, can integrate output
Final Output:
π Analysis Pathβ
Consulted "American Accent Training" Chapter 7, and searched historical reasons for American-British pronunciation difference.
β Core Answerβ
American English: /ΛskedΚuΛl/ (SKED-jool)
British English: /ΛΚedjuΛl/ (SHED-yool)
π Deep Analysisβ
-
American Pronunciation Points:
- First syllable stress: SKED-jool
- "ch" pronounced as /k/ (like school)
-
Why different?
- After American Independence in 18th century, Noah Webster promoted spelling reform
- American style tends to "pronounce as written" logic
π£οΈ Practice Suggestionsβ
Read contrast sentences:
- πΊπΈ "What's your schedule /ΛskedΚuΛl/ today?"
- π¬π§ "What's your schedule /ΛΚedjuΛl/ today?"
Part 4: Optimization Suggestionsβ
Prompt Optimization Directionβ
1. Add Error Handling Mechanismβ
# Error Handling
- If tool call fails, try rewriting keywords and retry (max 2 times)
- If neither knowledge base nor search yields results, provide alternative suggestions on related topics
- If contradictory information is retrieved, mark viewpoints from different sources and explain basis for choice
# Fallback Strategy
When all tools fail:
1. Give conservative advice based on general linguistic knowledge
2. Clearly state "This is speculation based on conventional rules"
3. Suggest user consult authoritative dictionary or professional
2. Add User Profile Adaptationβ
# User Context Adaptation
- Default user English level: Intermediate (CEFR B1-B2)
- If user question involves advanced grammar (like subjunctive mood), increase depth of explanation
- If user asks using simple sentences, avoid overly academic terms
- Adjust example sentence difficulty based on history
# Dynamic Difficulty
- Initial Interaction: Use daily example sentences
- After Multi-round Dialogue: Adjust professionalism based on user feedback
3. Enhance Interactivityβ
# Interactive Features
- Provide follow-up options after answering:
- π
°οΈ More examples
- π
±οΈ Related grammar points
- π
²οΈ Pronunciation demonstration
- π
³οΈ Practice test
- For complex questions, ask: "Which part needs deeper explanation?"
- Provide "Simplified" and "Full" output mode switch
4. Add Knowledge Tracingβ
# Citation & Sources
- Knowledge Base Citation Format: [Source: "Grammar in Use" Unit 12, P.45]
- Web Search Citation Format: [Source: Oxford Dictionary Online, 2024-01-15]
- If combining multiple sources, label "Synthesized from following 3 sources:..."
Agent Configuration Optimizationβ
1. Knowledge Base Optimization Strategyβ
Layered Retrieval:
Layer 1 (Coarse Retrieval):
- Method: BM25 Keyword Match
- Top K: 20
- Purpose: Quickly locate relevant chapters
Layer 2 (Fine Retrieval):
- Method: Semantic Embedding
- Top K: 5
- Rerank: BAAI/bge-reranker-large
- Purpose: Find most relevant paragraphs
Metadata Enhancement:
{
"doc_id": "grammar_unit_12",
"title": "Present Perfect Tense",
"level": "intermediate",
"tags": ["tense", "perfect", "have+pp"],
"unit": 12,
"page": 24
}
2. Tool Orchestration Optimizationβ
Parallel Call (Suitable for independent queries):
# When user asks about grammar and pronunciation at the same time
Parallel:
- Tool: Knowledge_Retrieval
Query: "present perfect grammar"
- Tool: Knowledge_Retrieval
Query: "perfect pronunciation"
Conditional Branch (Suitable for dependency):
IF Knowledge_Retrieval.score < 0.6:
THEN Web_Search
ELSE:
SKIP Web_Search
3. Performance Optimizationβ
Cache Strategy:
# Knowledge Base Cache
Cache TTL: 7 days
Cache Key: query_hash + knowledge_base_version
# Search Result Cache
Cache TTL: 1 day
Cache Key: search_query_hash
Rate Limiting:
# Prevent too many tool calls
Max Tool Calls Per Turn: 5
Max Total Calls Per Session: 50
Cooldown: 100ms between calls
Token Optimization:
# Truncation Strategy
Knowledge Result Max Tokens: 800
Search Result Max Tokens: 300
Total Context Max Tokens: 2000
Part 5: Advanced Techniquesβ
1. Multi-modal Input Supportβ
Voice Input Processing:
# Voice Input Handler
IF input_type == "audio":
1. Use Whisper API to transcribe to text
2. Identify pronunciation errors (contrast standard phonetics)
3. Mark actual pronunciation vs standard pronunciation in answer
Image Input Processing (e.g., photo notes):
# Image Input Handler
IF input_type == "image":
1. Use OCR to extract text
2. Identify handwritten annotated questions
3. Focus explanation on annotated parts
2. Personalized Memory Systemβ
Knowledge Point Tracking:
{
"user_id": "user_123",
"weak_points": [
"present_perfect",
"pronunciation_th"
],
"mastered_points": [
"simple_past",
"articles"
],
"practice_history": [...]
}
Adaptive Recommendation:
# Adaptive Practice
Based on user weaknesses, append after every answer:
"π‘ By the way, you asked about Present Perfect before, today's example sentence also used this tense, pay attention to..."
β Pre-deployment Checklistβ
Basic Configurationβ
- Prompt fully copied to Agent system prompt
- Model selected correctly and supports Function Calling / ReAct
- Agent Mode configured (Function Calling priority)
- Temperature set to 0.3-0.5 (Balance accuracy and creativity)
Tool Configurationβ
- Knowledge Base created and all documents uploaded
- Knowledge Base retrieval parameters tuned (Top K, Score Threshold)
- Web Search tool enabled and tested available
- Optional tools (TTS/Chart etc.) configured as needed
Test Validationβ
- Test Case 1: Pure grammar question (Should only call knowledge base)
- Test Case 2: Neologism question (Should call search)
- Test Case 3: Mixed question (Should call two tools in sequence)
- Reasoning process log clear and readable
- Token consumption within budget (< 2000 tokens/query)
User Experienceβ
- Output format beautiful (Markdown renders normally)
- Analysis path clear (User can understand AI decision process)
- Practice suggestions practical (Can be used directly)
- Response speed < 10 seconds
Securityβ
- No copyright issues with knowledge base content
- Search tool safe search enabled
- No sensitive information leakage risk
- Error handling perfect (Does not crash when tool call fails)
π Related Resourcesβ
Documentation & Tutorialsβ
Recommended Toolsβ
- Knowledge Base Management: Dify Knowledge Base / LangChain Document Loader
- Search API: Tavily (AI Optimized) / SerpAPI (Comprehensive) / DuckDuckGo (Free)
- Speech Synthesis: Azure TTS / ElevenLabs / OpenAI TTS
- Rerank Model: BAAI/bge-reranker / Cohere Rerank
Advanced Readingβ
- Anthropic - Prompt Engineering Guide
- OpenAI - Best Practices for Agent Development
- Google - Responsible AI Practices
Document Version: v2.0 Last Update: 2025-01-21 Applicable Platform: Dify / LangChain / Semantic Kernel / AutoGPT License: MIT License
Appendix: FAQβ
Q1: Why recommend DeepSeek-V3 instead of GPT-4?
A: In cost-sensitive scenarios, DeepSeek-V3 offers reasoning capabilities close to GPT-4, but price is only 1/10. For education apps, cost-performance ratio is higher. If budget is sufficient and best effect is needed, GPT-4o is still first choice.
Q2: What is difference between ReAct and Chain-of-Thought (CoT)?
A:
- CoT: Only thinking process, no tool call (pure reasoning)
- ReAct: Closed loop of Thought + Action + Observation (can call external tools)
Simply put, ReAct = CoT + Tool Use
Q3: How to judge knowledge base retrieval failure and need to call search?
A: Set Score Threshold = 0.6. When retrieval result relevance score < 0.6, automatically trigger search. Can also specify keyword blacklist in prompt (e.g., "Rizz", "Skibidi" etc. new words search directly).
Q4: What if Agent reasoning rounds are too many causing timeout?
A:
- Reduce Max Iterations (e.g. 10 β 5)
- Add hard constraint "Must give answer within 2 rounds" in prompt
- Use Function Calling instead of ReAct (faster)
Q5: Can this scheme be used for non-English teaching Agents?
A: Absolutely! Just replace Role, Goals and Knowledge Base content. For example:
- Legal Advisor: Knowledge Base = Laws, Search = Case Library
- Medical Consultation: Knowledge Base = Medical Textbook, Search = Latest Research
- Programming Assistant: Knowledge Base = Official Docs, Search = Stack Overflow
Core framework (ReAct + Tool Orchestration) is universal.