16 KiB
AI Knowledge Collection Sync - Dokumentation
Version: 1.0
Datum: 11. März 2026
Status: ✅ Implementiert
Überblick
Synchronisiert EspoCRM CAIKnowledge Entities mit XAI Collections für semantische Dokumentensuche. Unterstützt vollständigen Collection-Lifecycle, BLAKE3-basierte Integritätsprüfung und robustes Hash-basiertes Change Detection.
Features
✅ Collection Lifecycle Management
- NEW → Collection erstellen in XAI
- ACTIVE → Automatischer Sync der Dokumente
- PAUSED → Sync pausiert, Collection bleibt
- DEACTIVATED → Collection aus XAI löschen
✅ Dual-Hash Change Detection
- EspoCRM Hash (MD5/SHA256) für lokale Änderungserkennung
- XAI BLAKE3 Hash für Remote-Integritätsverifikation
- Metadata-Hash für Beschreibungs-Änderungen
✅ Robustheit
- BLAKE3 Verification nach jedem Upload
- Metadata-Only Updates via PATCH
- Orphan Detection & Cleanup
- Distributed Locking (Redis)
- Daily Full Sync (02:00 Uhr nachts)
✅ Fehlerbehandlung
- Unsupported MIME Types → Status "unsupported"
- Transient Errors → Retry mit Exponential Backoff
- Partial Failures toleriert
Architektur
┌─────────────────────────────────────────────────────────────────┐
│ EspoCRM CAIKnowledge │
│ ├─ activationStatus: new/active/paused/deactivated │
│ ├─ syncStatus: unclean/pending_sync/synced/failed │
│ └─ datenbankId: XAI Collection ID │
└─────────────────────────────────────────────────────────────────┘
↓ Webhook
┌─────────────────────────────────────────────────────────────────┐
│ Motia Webhook Handler │
│ → POST /vmh/webhook/aiknowledge/update │
└─────────────────────────────────────────────────────────────────┘
↓ Emit Event
┌─────────────────────────────────────────────────────────────────┐
│ Queue: aiknowledge.sync │
└─────────────────────────────────────────────────────────────────┘
↓ Lock: aiknowledge:{id}
┌─────────────────────────────────────────────────────────────────┐
│ Sync Handler │
│ ├─ Check activationStatus │
│ ├─ Manage Collection Lifecycle │
│ ├─ Sync Documents (with BLAKE3 verification) │
│ └─ Update Statuses │
└─────────────────────────────────────────────────────────────────┘
↓
┌─────────────────────────────────────────────────────────────────┐
│ XAI Collections API │
│ └─ Collections with embedded documents │
└─────────────────────────────────────────────────────────────────┘
EspoCRM Konfiguration
1. Entity: CAIKnowledge
Felder:
| Feld | Typ | Beschreibung | Werte |
|---|---|---|---|
name |
varchar(255) | Name der Knowledge Base | - |
datenbankId |
varchar(255) | XAI Collection ID | Automatisch gefüllt |
activationStatus |
enum | Lifecycle-Status | new, active, paused, deactivated |
syncStatus |
enum | Sync-Status | unclean, pending_sync, synced, failed |
lastSync |
datetime | Letzter erfolgreicher Sync | ISO 8601 |
syncError |
text | Fehlermeldung bei Failure | Max 2000 Zeichen |
Enum-Definitionen:
{
"activationStatus": {
"type": "enum",
"options": ["new", "active", "paused", "deactivated"],
"default": "new"
},
"syncStatus": {
"type": "enum",
"options": ["unclean", "pending_sync", "synced", "failed"],
"default": "unclean"
}
}
2. Junction: CAIKnowledgeCDokumente
additionalColumns:
| Feld | Typ | Beschreibung |
|---|---|---|
aiDocumentId |
varchar(255) | XAI file_id |
syncstatus |
enum | Per-Document Sync-Status |
syncedHash |
varchar(64) | MD5/SHA256 von EspoCRM |
xaiBlake3Hash |
varchar(128) | BLAKE3 Hash von XAI |
syncedMetadataHash |
varchar(64) | Hash der Metadaten |
lastSync |
datetime | Letzter Sync dieses Dokuments |
Enum-Definition:
{
"syncstatus": {
"type": "enum",
"options": ["new", "unclean", "synced", "failed", "unsupported"]
}
}
3. Webhooks
Webhook 1: CREATE
{
"event": "CAIKnowledge.afterSave",
"url": "https://your-motia-domain.com/vmh/webhook/aiknowledge/update",
"method": "POST",
"payload": "{\"entity_id\": \"{$id}\", \"entity_type\": \"CAIKnowledge\", \"action\": \"create\"}",
"condition": "entity.isNew()"
}
Webhook 2: UPDATE
{
"event": "CAIKnowledge.afterSave",
"url": "https://your-motia-domain.com/vmh/webhook/aiknowledge/update",
"method": "POST",
"payload": "{\"entity_id\": \"{$id}\", \"entity_type\": \"CAIKnowledge\", \"action\": \"update\"}",
"condition": "!entity.isNew()"
}
Webhook 3: DELETE (Optional)
{
"event": "CAIKnowledge.afterRemove",
"url": "https://your-motia-domain.com/vmh/webhook/aiknowledge/delete",
"method": "POST",
"payload": "{\"entity_id\": \"{$id}\", \"entity_type\": \"CAIKnowledge\", \"action\": \"delete\"}"
}
Empfehlung: Nur CREATE + UPDATE verwenden. DELETE über activationStatus="deactivated" steuern.
4. Hooks (EspoCRM Backend)
Hook 1: Document Link → syncStatus auf "unclean"
// Hooks/Custom/CAIKnowledge/AfterRelateLinkMultiple.php
namespace Espo\Custom\Hooks\CAIKnowledge;
class AfterRelateLinkMultiple extends \Espo\Core\Hooks\Base
{
public function afterRelateLinkMultiple($entity, $options, $data)
{
if ($data['link'] === 'dokumentes') {
// Mark as unclean when documents linked
$entity->set('syncStatus', 'unclean');
$this->getEntityManager()->saveEntity($entity);
}
}
}
Hook 2: Document Change → Junction auf "unclean"
// Hooks/Custom/CDokumente/AfterSave.php
namespace Espo\Custom\Hooks\CDokumente;
class AfterSave extends \Espo\Core\Hooks\Base
{
public function afterSave($entity, $options)
{
if ($entity->isAttributeChanged('description') ||
$entity->isAttributeChanged('md5') ||
$entity->isAttributeChanged('sha256')) {
// Mark all junction entries as unclean
$this->updateJunctionStatuses($entity->id, 'unclean');
// Mark all related CAIKnowledge as unclean
$this->markRelatedKnowledgeUnclean($entity->id);
}
}
}
Environment Variables
# XAI API Keys (erforderlich)
XAI_API_KEY=your_xai_api_key_here
XAI_MANAGEMENT_KEY=your_xai_management_key_here
# Redis (für Locking)
REDIS_HOST=localhost
REDIS_PORT=6379
# EspoCRM
ESPOCRM_API_BASE_URL=https://crm.bitbylaw.com/api/v1
ESPOCRM_API_KEY=your_espocrm_api_key
Workflows
Workflow 1: Neue Knowledge Base erstellen
1. User erstellt CAIKnowledge in EspoCRM
└─ activationStatus: "new" (default)
2. Webhook CREATE gefeuert
└─ Event: aiknowledge.sync
3. Sync Handler:
└─ activationStatus="new" → Collection erstellen in XAI
└─ Update EspoCRM:
├─ datenbankId = collection_id
├─ activationStatus = "active"
└─ syncStatus = "unclean"
4. Nächster Webhook (UPDATE):
└─ activationStatus="active" → Dokumente syncen
Workflow 2: Dokumente hinzufügen
1. User verknüpft Dokumente mit CAIKnowledge
└─ EspoCRM Hook setzt syncStatus = "unclean"
2. Webhook UPDATE gefeuert
└─ Event: aiknowledge.sync
3. Sync Handler:
└─ Für jedes Junction-Entry:
├─ Check: MIME Type supported?
├─ Check: Hash changed?
├─ Download von EspoCRM
├─ Upload zu XAI mit Metadata
├─ Verify Upload (BLAKE3)
└─ Update Junction: syncstatus="synced"
4. Update CAIKnowledge:
└─ syncStatus = "synced"
└─ lastSync = now()
Workflow 3: Metadata-Änderung
1. User ändert Document.description in EspoCRM
└─ EspoCRM Hook setzt Junction syncstatus = "unclean"
└─ EspoCRM Hook setzt CAIKnowledge syncStatus = "unclean"
2. Webhook UPDATE gefeuert
3. Sync Handler:
└─ Berechne Metadata-Hash
└─ Hash unterschiedlich? → PATCH zu XAI
└─ Falls PATCH fehlschlägt → Fallback: Re-upload
└─ Update Junction: syncedMetadataHash
Workflow 4: Knowledge Base deaktivieren
1. User setzt activationStatus = "deactivated"
2. Webhook UPDATE gefeuert
3. Sync Handler:
└─ Collection aus XAI löschen
└─ Alle Junction Entries zurücksetzen:
├─ syncstatus = "new"
└─ aiDocumentId = NULL
└─ CAIKnowledge bleibt in EspoCRM (mit datenbankId)
Workflow 5: Daily Full Sync
Cron: Täglich um 02:00 Uhr
1. Lade alle CAIKnowledge mit:
└─ activationStatus = "active"
└─ syncStatus IN ("unclean", "failed")
2. Für jedes:
└─ Emit: aiknowledge.sync Event
3. Queue verarbeitet alle sequenziell
└─ Fängt verpasste Webhooks ab
Monitoring & Troubleshooting
Logs prüfen
# Motia Service Logs
sudo journalctl -u motia-iii -f | grep -i "ai knowledge"
# Letzte 100 Sync-Events
sudo journalctl -u motia-iii -n 100 | grep "AI KNOWLEDGE SYNC"
# Fehler der letzten 24 Stunden
sudo journalctl -u motia-iii --since "24 hours ago" | grep "❌"
EspoCRM Status prüfen
-- Alle Knowledge Bases mit Status
SELECT
id,
name,
activation_status,
sync_status,
last_sync,
sync_error
FROM c_ai_knowledge
WHERE activation_status = 'active';
-- Junction Entries mit Sync-Problemen
SELECT
j.id,
k.name AS knowledge_name,
d.name AS document_name,
j.syncstatus,
j.last_sync
FROM c_ai_knowledge_c_dokumente j
JOIN c_ai_knowledge k ON j.c_ai_knowledge_id = k.id
JOIN c_dokumente d ON j.c_dokumente_id = d.id
WHERE j.syncstatus IN ('failed', 'unsupported');
Häufige Probleme
Problem: "Lock busy for aiknowledge:xyz"
Ursache: Vorheriger Sync noch aktiv oder abgestürzt
Lösung:
# Redis lock manuell freigeben
redis-cli
> DEL sync_lock:aiknowledge:xyz
Problem: "Unsupported MIME type"
Ursache: Document hat MIME Type, den XAI nicht unterstützt
Lösung:
- Dokument konvertieren (z.B. RTF → PDF)
- Oder: Akzeptieren (bleibt mit Status "unsupported")
Problem: "Upload verification failed"
Ursache: XAI liefert kein BLAKE3 Hash oder Hash-Mismatch
Lösung:
- Prüfe XAI API Dokumentation (Hash-Format geändert?)
- Falls temporär: Retry läuft automatisch
- Falls persistent: XAI Support kontaktieren
Problem: "Collection not found"
Ursache: Collection wurde manuell in XAI gelöscht
Lösung: Automatisch gelöst - Sync erstellt neue Collection
API Endpoints
Webhook Endpoint
POST /vmh/webhook/aiknowledge/update
Content-Type: application/json
{
"entity_id": "kb-123",
"entity_type": "CAIKnowledge",
"action": "update"
}
Response:
{
"success": true,
"knowledge_id": "kb-123"
}
Performance
Typische Sync-Zeiten
| Szenario | Zeit | Notizen |
|---|---|---|
| Collection erstellen | < 1s | Nur API Call |
| 1 Dokument (1 MB) | 2-4s | Upload + Verify |
| 10 Dokumente (10 MB) | 20-40s | Sequenziell |
| 100 Dokumente (100 MB) | 3-6 min | Lock TTL: 30 min |
| Metadata-only Update | < 1s | Nur PATCH |
| Orphan Cleanup | 1-3s | Pro 10 Dokumente |
Lock TTLs
- AIKnowledge Sync: 30 Minuten (1800 Sekunden)
- Redis Lock: Same as above
- Auto-Release: Bei Timeout (TTL expired)
Rate Limits
XAI API:
- Files Upload: ~100 requests/minute
- Management API: ~1000 requests/minute
Strategie bei Rate Limit (429):
- Exponential Backoff: 2s, 4s, 8s, 16s, 32s
- Respect
Retry-AfterHeader - Max 5 Retries
XAI Collections Metadata
Document Metadata Fields
Werden für jedes Dokument in XAI gespeichert:
{
"fields": {
"document_name": "Vertrag.pdf",
"description": "Mietvertrag Mustermann",
"created_at": "2024-01-01T00:00:00Z",
"modified_at": "2026-03-10T15:30:00Z",
"espocrm_id": "dok-123"
}
}
inject_into_chunk: true für document_name und description
→ Verbessert semantische Suche
Collection Metadata
{
"metadata": {
"espocrm_entity_type": "CAIKnowledge",
"espocrm_entity_id": "kb-123",
"created_at": "2026-03-11T10:00:00Z"
}
}
Testing
Manueller Test
# 1. Erstelle CAIKnowledge in EspoCRM
# 2. Prüfe Logs
sudo journalctl -u motia-iii -f
# 3. Prüfe Redis Lock
redis-cli
> KEYS sync_lock:aiknowledge:*
# 4. Prüfe XAI Collection
curl -H "Authorization: Bearer $XAI_MANAGEMENT_KEY" \
https://management-api.x.ai/v1/collections
Integration Test
# tests/test_aiknowledge_sync.py
async def test_full_sync_workflow():
"""Test complete sync workflow"""
# 1. Create CAIKnowledge with status "new"
knowledge = await espocrm.create_entity('CAIKnowledge', {
'name': 'Test KB',
'activationStatus': 'new'
})
# 2. Trigger webhook
await trigger_webhook(knowledge['id'])
# 3. Wait for sync
await asyncio.sleep(5)
# 4. Check collection created
knowledge = await espocrm.get_entity('CAIKnowledge', knowledge['id'])
assert knowledge['datenbankId'] is not None
assert knowledge['activationStatus'] == 'active'
# 5. Link document
await espocrm.link_entities('CAIKnowledge', knowledge['id'], 'CDokumente', doc_id)
# 6. Trigger webhook again
await trigger_webhook(knowledge['id'])
await asyncio.sleep(10)
# 7. Check junction synced
junction = await espocrm.get_junction_entries(
'CAIKnowledgeCDokumente',
'cAIKnowledgeId',
knowledge['id']
)
assert junction[0]['syncstatus'] == 'synced'
assert junction[0]['xaiBlake3Hash'] is not None
Maintenance
Wöchentliche Checks
- Prüfe failed Syncs in EspoCRM
- Prüfe Redis Memory Usage
- Prüfe XAI Storage Usage
- Review Logs für Patterns
Monatliche Tasks
- Cleanup alte syncError Messages
- Verify XAI Collection Integrity
- Review Performance Metrics
- Update MIME Type Support List
Support
Bei Problemen:
- Logs prüfen:
journalctl -u motia-iii -f - EspoCRM Status prüfen: SQL Queries (siehe oben)
- Redis Locks prüfen:
redis-cli KEYS sync_lock:* - XAI API Status: https://status.x.ai
Kontakt:
- Team: BitByLaw Development
- Motia Docs:
/opt/motia-iii/bitbylaw/docs/INDEX.md
Version History:
- 1.0 (11.03.2026) - Initial Release
- Collection Lifecycle Management
- BLAKE3 Hash Verification
- Daily Full Sync
- Metadata Change Detection