rag-design
Production-grade RAG (Retrieval-Augmented Generation) system design patterns from OpenClaw. Use when implementing semantic search, vector retrieval, memory systems, or knowledge bases. Covers hybrid search (vector + keyword), embedding providers, MMR re-ranking, temporal decay, and enterprise features.
/plugin install medical_evaluation_systemdetails
RAG Design Patterns
Production-grade RAG implementation patterns extracted from OpenClaw's memory system.
Core Architecture
Hybrid Search Strategy
Combine vector and keyword search for optimal retrieval:
// Parallel execution
const [vectorResults, keywordResults] = await Promise.all([
vectorSearch(query, { limit: 50 }),
keywordSearch(query, { limit: 50 })
]);
// Weighted merge
const merged = mergeResults({
vector: vectorResults,
keyword: keywordResults,
vectorWeight: 0.7, // Semantic relevance
textWeight: 0.3 // Keyword matching
});
Storage Layer
Use SQLite with extensions for production deployments:
- Vector table:
sqlite-vecextension for embeddings - FTS table: SQLite FTS5 for full-text search
- Cache table: Store computed embeddings to avoid recomputation
-- Vector storage
CREATE VIRTUAL TABLE chunks_vec USING vec0(
embedding FLOAT[768]
);
-- Full-text search
CREATE VIRTUAL TABLE chunks_fts USING fts5(
content, path, source
);
-- Embedding cache
CREATE TABLE embedding_cache (
content_hash TEXT PRIMARY KEY,
embedding BLOB,
provider TEXT,
model TEXT
);
Embedding Providers
Support multiple providers with fallback:
const providers = {
openai: 'text-embedding-3-small',
gemini: 'text-embedding-004',
voyage: 'voyage-2',
mistral: 'mistral-embed',
local: 'node-llama-cpp'
};
// Auto-fallback on failure
async function createProvider(config) {
try {
return await initPrimary(config.provider);
} catch (err) {
if (config.fallback) {
return await initFallback(config.fallback);
}
throw err;
}
}
Advanced Retrieval
MMR Re-ranking
Balance relevance with diversity using Maximal Marginal Relevance:
// Iteratively select diverse results
function applyMMR(results, lambda = 0.7) {
const selected = [];
const remaining = [...results];
while (selected.length < maxResults && remaining.length > 0) {
let bestIdx = 0;
let bestScore = -Infinity;
for (let i = 0; i < remaining.length; i++) {
const relevance = remaining[i].score;
const maxSim = maxSimilarity(remaining[i], selected);
const mmrScore = lambda * relevance - (1 - lambda) * maxSim;
if (mmrScore > bestScore) {
bestScore = mmrScore;
bestIdx = i;
}
}
selected.push(remaining[bestIdx]);
remaining.splice(bestIdx, 1);
}
return selected;
}
Temporal Decay
Boost recent results with time-based scoring:
function applyTemporalDecay(results, halfLifeDays = 30) {
const now = Date.now();
const halfLifeMs = halfLifeDays * 24 * 60 * 60 * 1000;
return results.map(r => ({
...r,
score: r.score * Math.exp(-Math.log(2) *
(now - r.timestamp) / halfLifeMs)
}));
}
Query Enhancement
Query Expansion
Extract keywords to improve recall:
function expandQuery(query) {
// Extract meaningful terms
const keywords = query
.match(/[\p{L}\p{N}_]+/gu)
?.filter(t => t.length > 2) ?? [];
// Build FTS query
return keywords
.map(k => `"${k}"`)
.join(' AND ');
}
Enterprise Features
Batch Processing
Reduce API costs with batch embedding:
async function batchEmbed(texts, provider) {
const batches = chunk(texts, provider.batchSize);
const results = [];
for (const batch of batches) {
const embeddings = await provider.embedBatch(batch);
results.push(...embeddings);
// Rate limiting
await sleep(provider.delayMs);
}
return results;
}
Incremental Indexing
Watch files and update index on changes:
const watcher = chokidar.watch('memory/**/*.md', {
ignoreInitial: true
});
watcher.on('change', debounce(async (path) => {
await reindexFile(path);
}, 1000));
Error Recovery
Handle transient failures gracefully:
async function embedWithRetry(text, maxRetries = 3) {
for (let i = 0; i < maxRetries; i++) {
try {
return await embed(text);
} catch (err) {
if (i === maxRetries - 1) throw err;
await sleep(Math.pow(2, i) * 1000);
}
}
}
Agent Integration
Expose as tools for AI agents:
const tools = [
{
name: 'memory_search',
description: 'Semantic search over indexed documents',
parameters: {
query: 'string',
maxResults: 'number?',
minScore: 'number?'
},
execute: async (params) => {
return await searchManager.search(params.query, {
maxResults: params.maxResults ?? 10,
minScore: params.minScore ?? 0.7
});
}
}
];
Performance Optimization
- Cache embeddings: Hash content and reuse computed vectors
- Parallel search: Run vector and keyword search concurrently
- Limit context: Return snippets, not full documents
- Debounce updates: Batch file changes before reindexing
Reference Files
- Architecture Details - System design and data flow
- Implementation Guide - TypeScript step-by-step setup
- Python Implementation - Python version with LangChain/LlamaIndex
- Configuration - Tuning parameters
technical
- github
- Adamuuuu/medical_evaluation_system
- stars
- 0
- license
- unspecified
- contributors
- 2
- last commit
- 2026-03-09T03:22:08Z
- file
- .claude/skills/rag-design/SKILL.md