Files
motia-iii/docs/AI_KNOWLEDGE_SYNC.md

16 KiB

AI Knowledge Collection Sync - Dokumentation

Version: 1.0
Datum: 11. März 2026
Status: Implementiert


Überblick

Synchronisiert EspoCRM CAIKnowledge Entities mit XAI Collections für semantische Dokumentensuche. Unterstützt vollständigen Collection-Lifecycle, BLAKE3-basierte Integritätsprüfung und robustes Hash-basiertes Change Detection.

Features

Collection Lifecycle Management

  • NEW → Collection erstellen in XAI
  • ACTIVE → Automatischer Sync der Dokumente
  • PAUSED → Sync pausiert, Collection bleibt
  • DEACTIVATED → Collection aus XAI löschen

Dual-Hash Change Detection

  • EspoCRM Hash (MD5/SHA256) für lokale Änderungserkennung
  • XAI BLAKE3 Hash für Remote-Integritätsverifikation
  • Metadata-Hash für Beschreibungs-Änderungen

Robustheit

  • BLAKE3 Verification nach jedem Upload
  • Metadata-Only Updates via PATCH
  • Orphan Detection & Cleanup
  • Distributed Locking (Redis)
  • Daily Full Sync (02:00 Uhr nachts)

Fehlerbehandlung

  • Unsupported MIME Types → Status "unsupported"
  • Transient Errors → Retry mit Exponential Backoff
  • Partial Failures toleriert

Architektur

┌─────────────────────────────────────────────────────────────────┐
│ EspoCRM CAIKnowledge                                            │
│   ├─ activationStatus: new/active/paused/deactivated           │
│   ├─ syncStatus: unclean/pending_sync/synced/failed            │
│   └─ datenbankId: XAI Collection ID                            │
└─────────────────────────────────────────────────────────────────┘
                    ↓ Webhook
┌─────────────────────────────────────────────────────────────────┐
│ Motia Webhook Handler                                          │
│   → POST /vmh/webhook/aiknowledge/update                       │
└─────────────────────────────────────────────────────────────────┘
                    ↓ Emit Event
┌─────────────────────────────────────────────────────────────────┐
│ Queue: aiknowledge.sync                                        │
└─────────────────────────────────────────────────────────────────┘
                    ↓ Lock: aiknowledge:{id}
┌─────────────────────────────────────────────────────────────────┐
│ Sync Handler                                                    │
│   ├─ Check activationStatus                                    │
│   ├─ Manage Collection Lifecycle                               │
│   ├─ Sync Documents (with BLAKE3 verification)                 │
│   └─ Update Statuses                                           │
└─────────────────────────────────────────────────────────────────┘
                    ↓
┌─────────────────────────────────────────────────────────────────┐
│ XAI Collections API                                            │
│   └─ Collections with embedded documents                       │
└─────────────────────────────────────────────────────────────────┘

EspoCRM Konfiguration

1. Entity: CAIKnowledge

Felder:

Feld Typ Beschreibung Werte
name varchar(255) Name der Knowledge Base -
datenbankId varchar(255) XAI Collection ID Automatisch gefüllt
activationStatus enum Lifecycle-Status new, active, paused, deactivated
syncStatus enum Sync-Status unclean, pending_sync, synced, failed
lastSync datetime Letzter erfolgreicher Sync ISO 8601
syncError text Fehlermeldung bei Failure Max 2000 Zeichen

Enum-Definitionen:

{
  "activationStatus": {
    "type": "enum",
    "options": ["new", "active", "paused", "deactivated"],
    "default": "new"
  },
  "syncStatus": {
    "type": "enum",
    "options": ["unclean", "pending_sync", "synced", "failed"],
    "default": "unclean"
  }
}

2. Junction: CAIKnowledgeCDokumente

additionalColumns:

Feld Typ Beschreibung
aiDocumentId varchar(255) XAI file_id
syncstatus enum Per-Document Sync-Status
syncedHash varchar(64) MD5/SHA256 von EspoCRM
xaiBlake3Hash varchar(128) BLAKE3 Hash von XAI
syncedMetadataHash varchar(64) Hash der Metadaten
lastSync datetime Letzter Sync dieses Dokuments

Enum-Definition:

{
  "syncstatus": {
    "type": "enum",
    "options": ["new", "unclean", "synced", "failed", "unsupported"]
  }
}

3. Webhooks

Webhook 1: CREATE

{
  "event": "CAIKnowledge.afterSave",
  "url": "https://your-motia-domain.com/vmh/webhook/aiknowledge/update",
  "method": "POST",
  "payload": "{\"entity_id\": \"{$id}\", \"entity_type\": \"CAIKnowledge\", \"action\": \"create\"}",
  "condition": "entity.isNew()"
}

Webhook 2: UPDATE

{
  "event": "CAIKnowledge.afterSave",
  "url": "https://your-motia-domain.com/vmh/webhook/aiknowledge/update",
  "method": "POST",
  "payload": "{\"entity_id\": \"{$id}\", \"entity_type\": \"CAIKnowledge\", \"action\": \"update\"}",
  "condition": "!entity.isNew()"
}

Webhook 3: DELETE (Optional)

{
  "event": "CAIKnowledge.afterRemove",
  "url": "https://your-motia-domain.com/vmh/webhook/aiknowledge/delete",
  "method": "POST",
  "payload": "{\"entity_id\": \"{$id}\", \"entity_type\": \"CAIKnowledge\", \"action\": \"delete\"}"
}

Empfehlung: Nur CREATE + UPDATE verwenden. DELETE über activationStatus="deactivated" steuern.

4. Hooks (EspoCRM Backend)

Hook 1: Document Link → syncStatus auf "unclean"

// Hooks/Custom/CAIKnowledge/AfterRelateLinkMultiple.php
namespace Espo\Custom\Hooks\CAIKnowledge;

class AfterRelateLinkMultiple extends \Espo\Core\Hooks\Base
{
    public function afterRelateLinkMultiple($entity, $options, $data)
    {
        if ($data['link'] === 'dokumentes') {
            // Mark as unclean when documents linked
            $entity->set('syncStatus', 'unclean');
            $this->getEntityManager()->saveEntity($entity);
        }
    }
}

Hook 2: Document Change → Junction auf "unclean"

// Hooks/Custom/CDokumente/AfterSave.php
namespace Espo\Custom\Hooks\CDokumente;

class AfterSave extends \Espo\Core\Hooks\Base
{
    public function afterSave($entity, $options)
    {
        if ($entity->isAttributeChanged('description') || 
            $entity->isAttributeChanged('md5') ||
            $entity->isAttributeChanged('sha256')) {
            
            // Mark all junction entries as unclean
            $this->updateJunctionStatuses($entity->id, 'unclean');
            
            // Mark all related CAIKnowledge as unclean
            $this->markRelatedKnowledgeUnclean($entity->id);
        }
    }
}

Environment Variables

# XAI API Keys (erforderlich)
XAI_API_KEY=your_xai_api_key_here
XAI_MANAGEMENT_KEY=your_xai_management_key_here

# Redis (für Locking)
REDIS_HOST=localhost
REDIS_PORT=6379

# EspoCRM
ESPOCRM_API_BASE_URL=https://crm.bitbylaw.com/api/v1
ESPOCRM_API_KEY=your_espocrm_api_key

Workflows

Workflow 1: Neue Knowledge Base erstellen

1. User erstellt CAIKnowledge in EspoCRM
   └─ activationStatus: "new" (default)

2. Webhook CREATE gefeuert
   └─ Event: aiknowledge.sync

3. Sync Handler:
   └─ activationStatus="new" → Collection erstellen in XAI
   └─ Update EspoCRM:
      ├─ datenbankId = collection_id
      ├─ activationStatus = "active"
      └─ syncStatus = "unclean"

4. Nächster Webhook (UPDATE):
   └─ activationStatus="active" → Dokumente syncen

Workflow 2: Dokumente hinzufügen

1. User verknüpft Dokumente mit CAIKnowledge
   └─ EspoCRM Hook setzt syncStatus = "unclean"

2. Webhook UPDATE gefeuert
   └─ Event: aiknowledge.sync

3. Sync Handler:
   └─ Für jedes Junction-Entry:
      ├─ Check: MIME Type supported?
      ├─ Check: Hash changed?
      ├─ Download von EspoCRM
      ├─ Upload zu XAI mit Metadata
      ├─ Verify Upload (BLAKE3)
      └─ Update Junction: syncstatus="synced"

4. Update CAIKnowledge:
   └─ syncStatus = "synced"
   └─ lastSync = now()

Workflow 3: Metadata-Änderung

1. User ändert Document.description in EspoCRM
   └─ EspoCRM Hook setzt Junction syncstatus = "unclean"
   └─ EspoCRM Hook setzt CAIKnowledge syncStatus = "unclean"

2. Webhook UPDATE gefeuert

3. Sync Handler:
   └─ Berechne Metadata-Hash
   └─ Hash unterschiedlich? → PATCH zu XAI
   └─ Falls PATCH fehlschlägt → Fallback: Re-upload
   └─ Update Junction: syncedMetadataHash

Workflow 4: Knowledge Base deaktivieren

1. User setzt activationStatus = "deactivated"

2. Webhook UPDATE gefeuert

3. Sync Handler:
   └─ Collection aus XAI löschen
   └─ Alle Junction Entries zurücksetzen:
      ├─ syncstatus = "new"
      └─ aiDocumentId = NULL
   └─ CAIKnowledge bleibt in EspoCRM (mit datenbankId)

Workflow 5: Daily Full Sync

Cron: Täglich um 02:00 Uhr

1. Lade alle CAIKnowledge mit:
   └─ activationStatus = "active"
   └─ syncStatus IN ("unclean", "failed")

2. Für jedes:
   └─ Emit: aiknowledge.sync Event

3. Queue verarbeitet alle sequenziell
   └─ Fängt verpasste Webhooks ab

Monitoring & Troubleshooting

Logs prüfen

# Motia Service Logs
sudo journalctl -u motia-iii -f | grep -i "ai knowledge"

# Letzte 100 Sync-Events
sudo journalctl -u motia-iii -n 100 | grep "AI KNOWLEDGE SYNC"

# Fehler der letzten 24 Stunden
sudo journalctl -u motia-iii --since "24 hours ago" | grep "❌"

EspoCRM Status prüfen

-- Alle Knowledge Bases mit Status
SELECT 
    id,
    name,
    activation_status,
    sync_status,
    last_sync,
    sync_error
FROM c_ai_knowledge
WHERE activation_status = 'active';

-- Junction Entries mit Sync-Problemen
SELECT 
    j.id,
    k.name AS knowledge_name,
    d.name AS document_name,
    j.syncstatus,
    j.last_sync
FROM c_ai_knowledge_c_dokumente j
JOIN c_ai_knowledge k ON j.c_ai_knowledge_id = k.id
JOIN c_dokumente d ON j.c_dokumente_id = d.id
WHERE j.syncstatus IN ('failed', 'unsupported');

Häufige Probleme

Problem: "Lock busy for aiknowledge:xyz"

Ursache: Vorheriger Sync noch aktiv oder abgestürzt

Lösung:

# Redis lock manuell freigeben
redis-cli
> DEL sync_lock:aiknowledge:xyz

Problem: "Unsupported MIME type"

Ursache: Document hat MIME Type, den XAI nicht unterstützt

Lösung:

  • Dokument konvertieren (z.B. RTF → PDF)
  • Oder: Akzeptieren (bleibt mit Status "unsupported")

Problem: "Upload verification failed"

Ursache: XAI liefert kein BLAKE3 Hash oder Hash-Mismatch

Lösung:

  1. Prüfe XAI API Dokumentation (Hash-Format geändert?)
  2. Falls temporär: Retry läuft automatisch
  3. Falls persistent: XAI Support kontaktieren

Problem: "Collection not found"

Ursache: Collection wurde manuell in XAI gelöscht

Lösung: Automatisch gelöst - Sync erstellt neue Collection


API Endpoints

Webhook Endpoint

POST /vmh/webhook/aiknowledge/update
Content-Type: application/json

{
  "entity_id": "kb-123",
  "entity_type": "CAIKnowledge",
  "action": "update"
}

Response:

{
  "success": true,
  "knowledge_id": "kb-123"
}

Performance

Typische Sync-Zeiten

Szenario Zeit Notizen
Collection erstellen < 1s Nur API Call
1 Dokument (1 MB) 2-4s Upload + Verify
10 Dokumente (10 MB) 20-40s Sequenziell
100 Dokumente (100 MB) 3-6 min Lock TTL: 30 min
Metadata-only Update < 1s Nur PATCH
Orphan Cleanup 1-3s Pro 10 Dokumente

Lock TTLs

  • AIKnowledge Sync: 30 Minuten (1800 Sekunden)
  • Redis Lock: Same as above
  • Auto-Release: Bei Timeout (TTL expired)

Rate Limits

XAI API:

  • Files Upload: ~100 requests/minute
  • Management API: ~1000 requests/minute

Strategie bei Rate Limit (429):

  • Exponential Backoff: 2s, 4s, 8s, 16s, 32s
  • Respect Retry-After Header
  • Max 5 Retries

XAI Collections Metadata

Document Metadata Fields

Werden für jedes Dokument in XAI gespeichert:

{
  "fields": {
    "document_name": "Vertrag.pdf",
    "description": "Mietvertrag Mustermann",
    "created_at": "2024-01-01T00:00:00Z",
    "modified_at": "2026-03-10T15:30:00Z",
    "espocrm_id": "dok-123"
  }
}

inject_into_chunk: true für document_name und description
→ Verbessert semantische Suche

Collection Metadata

{
  "metadata": {
    "espocrm_entity_type": "CAIKnowledge",
    "espocrm_entity_id": "kb-123",
    "created_at": "2026-03-11T10:00:00Z"
  }
}

Testing

Manueller Test

# 1. Erstelle CAIKnowledge in EspoCRM
# 2. Prüfe Logs
sudo journalctl -u motia-iii -f

# 3. Prüfe Redis Lock
redis-cli
> KEYS sync_lock:aiknowledge:*

# 4. Prüfe XAI Collection
curl -H "Authorization: Bearer $XAI_MANAGEMENT_KEY" \
  https://management-api.x.ai/v1/collections

Integration Test

# tests/test_aiknowledge_sync.py

async def test_full_sync_workflow():
    """Test complete sync workflow"""
    
    # 1. Create CAIKnowledge with status "new"
    knowledge = await espocrm.create_entity('CAIKnowledge', {
        'name': 'Test KB',
        'activationStatus': 'new'
    })
    
    # 2. Trigger webhook
    await trigger_webhook(knowledge['id'])
    
    # 3. Wait for sync
    await asyncio.sleep(5)
    
    # 4. Check collection created
    knowledge = await espocrm.get_entity('CAIKnowledge', knowledge['id'])
    assert knowledge['datenbankId'] is not None
    assert knowledge['activationStatus'] == 'active'
    
    # 5. Link document
    await espocrm.link_entities('CAIKnowledge', knowledge['id'], 'CDokumente', doc_id)
    
    # 6. Trigger webhook again
    await trigger_webhook(knowledge['id'])
    await asyncio.sleep(10)
    
    # 7. Check junction synced
    junction = await espocrm.get_junction_entries(
        'CAIKnowledgeCDokumente',
        'cAIKnowledgeId',
        knowledge['id']
    )
    assert junction[0]['syncstatus'] == 'synced'
    assert junction[0]['xaiBlake3Hash'] is not None

Maintenance

Wöchentliche Checks

  • Prüfe failed Syncs in EspoCRM
  • Prüfe Redis Memory Usage
  • Prüfe XAI Storage Usage
  • Review Logs für Patterns

Monatliche Tasks

  • Cleanup alte syncError Messages
  • Verify XAI Collection Integrity
  • Review Performance Metrics
  • Update MIME Type Support List

Support

Bei Problemen:

  1. Logs prüfen: journalctl -u motia-iii -f
  2. EspoCRM Status prüfen: SQL Queries (siehe oben)
  3. Redis Locks prüfen: redis-cli KEYS sync_lock:*
  4. XAI API Status: https://status.x.ai

Kontakt:

  • Team: BitByLaw Development
  • Motia Docs: /opt/motia-iii/bitbylaw/docs/INDEX.md

Version History:

  • 1.0 (11.03.2026) - Initial Release
    • Collection Lifecycle Management
    • BLAKE3 Hash Verification
    • Daily Full Sync
    • Metadata Change Detection