feat(document-sync): enhance DocumentSync with file status checks and hash-based change detection; add thumbnail generation and metadata update methods
This commit is contained in:
229
docs/DOCUMENT_SYNC_XAI_STATUS.md
Normal file
229
docs/DOCUMENT_SYNC_XAI_STATUS.md
Normal file
@@ -0,0 +1,229 @@
|
||||
# Document Sync mit xAI Collections - Implementierungs-Status
|
||||
|
||||
## ✅ Implementiert
|
||||
|
||||
### 1. Webhook Endpunkte
|
||||
- **POST** `/vmh/webhook/document/create`
|
||||
- **POST** `/vmh/webhook/document/update`
|
||||
- **POST** `/vmh/webhook/document/delete`
|
||||
|
||||
### 2. Event Handler (`document_sync_event_step.py`)
|
||||
- Queue Topics: `vmh.document.{create|update|delete}`
|
||||
- Redis Distributed Locking
|
||||
- Vollständiges Document Loading von EspoCRM
|
||||
|
||||
### 3. Sync Utilities (`document_sync_utils.py`)
|
||||
- **✅ Datei-Status Prüfung**: "Neu", "Geändert" → xAI-Sync erforderlich
|
||||
- **✅ Hash-basierte Change Detection**: MD5/SHA Vergleich für Updates
|
||||
- **✅ Related Entities Discovery**: Many-to-Many Attachments durchsuchen
|
||||
- **✅ Collection Requirements**: Automatische Ermittlung welche Collections nötig sind
|
||||
|
||||
## ⏳ In Arbeit
|
||||
|
||||
### 4. Thumbnail-Generierung (`generate_thumbnail()`)
|
||||
|
||||
**Anforderungen:**
|
||||
- Erste Seite eines PDFs als Vorschaubild
|
||||
- DOCX/DOC → PDF → Image Konvertierung
|
||||
- Bild-Dateien: Resize auf Thumbnail-Größe
|
||||
- Fallback: Generic File-Icons basierend auf MIME-Type
|
||||
|
||||
**Benötigte Dependencies:**
|
||||
```bash
|
||||
# Python Packages
|
||||
pip install pdf2image python-docx Pillow docx2pdf
|
||||
|
||||
# System Dependencies (Ubuntu/Debian)
|
||||
apt-get install poppler-utils libreoffice
|
||||
```
|
||||
|
||||
**Implementierungs-Schritte:**
|
||||
|
||||
1. **PDF Handling** (Priorität 1):
|
||||
```python
|
||||
from pdf2image import convert_from_path
|
||||
from PIL import Image
|
||||
import io
|
||||
|
||||
def generate_pdf_thumbnail(pdf_path: str) -> bytes:
|
||||
# Konvertiere erste Seite zu Image
|
||||
images = convert_from_path(pdf_path, first_page=1, last_page=1, dpi=150)
|
||||
thumbnail = images[0]
|
||||
|
||||
# Resize auf Thumbnail-Größe (z.B. 200x280)
|
||||
thumbnail.thumbnail((200, 280), Image.Resampling.LANCZOS)
|
||||
|
||||
# Convert zu bytes
|
||||
buffer = io.BytesIO()
|
||||
thumbnail.save(buffer, format='PNG')
|
||||
return buffer.getvalue()
|
||||
```
|
||||
|
||||
2. **DOCX Handling** (Priorität 2):
|
||||
```python
|
||||
from docx2pdf import convert
|
||||
import tempfile
|
||||
import os
|
||||
|
||||
def generate_docx_thumbnail(docx_path: str) -> bytes:
|
||||
# Temporäres PDF erstellen
|
||||
with tempfile.NamedTemporaryFile(suffix='.pdf', delete=False) as tmp:
|
||||
pdf_path = tmp.name
|
||||
|
||||
# DOCX → PDF Konvertierung (benötigt LibreOffice)
|
||||
convert(docx_path, pdf_path)
|
||||
|
||||
# PDF-Thumbnail generieren
|
||||
thumbnail = generate_pdf_thumbnail(pdf_path)
|
||||
|
||||
# Cleanup
|
||||
os.remove(pdf_path)
|
||||
|
||||
return thumbnail
|
||||
```
|
||||
|
||||
3. **Image Handling** (Priorität 3):
|
||||
```python
|
||||
from PIL import Image
|
||||
import io
|
||||
|
||||
def generate_image_thumbnail(image_path: str) -> bytes:
|
||||
img = Image.open(image_path)
|
||||
img.thumbnail((200, 280), Image.Resampling.LANCZOS)
|
||||
|
||||
buffer = io.BytesIO()
|
||||
img.save(buffer, format='PNG')
|
||||
return buffer.getvalue()
|
||||
```
|
||||
|
||||
4. **Thumbnail Upload zu EspoCRM**:
|
||||
```python
|
||||
# EspoCRM unterstützt Preview-Images via Attachment API
|
||||
async def upload_thumbnail_to_espocrm(
|
||||
document_id: str,
|
||||
thumbnail_bytes: bytes,
|
||||
espocrm_api
|
||||
):
|
||||
# Create Attachment
|
||||
attachment_data = {
|
||||
'name': 'preview.png',
|
||||
'type': 'image/png',
|
||||
'role': 'Inline Attachment',
|
||||
'parentType': 'Document',
|
||||
'parentId': document_id,
|
||||
'field': 'previewImage' # Custom field?
|
||||
}
|
||||
|
||||
# Upload via EspoCRM Attachment API
|
||||
# POST /api/v1/Attachment mit multipart/form-data
|
||||
# TODO: espocrm.py muss upload_attachment() Methode bekommen
|
||||
```
|
||||
|
||||
**Offene Fragen:**
|
||||
- Welches Feld in EspoCRM Document für Preview? `previewImage`? `thumbnail`?
|
||||
- Größe des Thumbnails? (empfohlen: 200x280 oder 300x400)
|
||||
- Format: PNG oder JPEG?
|
||||
|
||||
## ❌ Noch nicht implementiert
|
||||
|
||||
### 5. xAI Service (`xai_service.py`)
|
||||
|
||||
**Anforderungen:**
|
||||
- File Upload zu xAI (basierend auf `test_xai_collections_api.py`)
|
||||
- Add File zu Collections
|
||||
- Remove File von Collections
|
||||
- File Download von EspoCRM
|
||||
|
||||
**Referenz-Code vorhanden:**
|
||||
- `/opt/motia-iii/bitbylaw/test_xai_collections_api.py` (630 Zeilen, alle xAI Operations getestet)
|
||||
|
||||
**Implementierungs-Plan:**
|
||||
|
||||
```python
|
||||
class XAIService:
|
||||
def __init__(self, context=None):
|
||||
self.management_key = os.getenv('XAI_MANAGEMENT_KEY')
|
||||
self.api_key = os.getenv('XAI_API_KEY')
|
||||
self.context = context
|
||||
|
||||
async def upload_file(self, file_content: bytes, filename: str) -> str:
|
||||
"""Upload File zu xAI → returns file_id"""
|
||||
# Multipart/form-data upload
|
||||
# POST https://api.x.ai/v1/files
|
||||
pass
|
||||
|
||||
async def add_to_collection(self, collection_id: str, file_id: str):
|
||||
"""Add File zu Collection"""
|
||||
# POST https://management-api.x.ai/v1/collections/{collection_id}/documents/{file_id}
|
||||
pass
|
||||
|
||||
async def remove_from_collection(self, collection_id: str, file_id: str):
|
||||
"""Remove File von Collection"""
|
||||
# DELETE https://management-api.x.ai/v1/collections/{collection_id}/documents/{file_id}
|
||||
pass
|
||||
|
||||
async def download_from_espocrm(self, attachment_id: str) -> bytes:
|
||||
"""Download File von EspoCRM Attachment"""
|
||||
# GET https://crm.bitbylaw.com/api/v1/Attachment/file/{attachment_id}
|
||||
pass
|
||||
```
|
||||
|
||||
## 📋 Integration Checklist
|
||||
|
||||
### Vollständiger Upload-Flow:
|
||||
|
||||
1. ✅ Webhook empfangen → Event emittieren
|
||||
2. ✅ Event Handler: Lock acquire
|
||||
3. ✅ Document laden von EspoCRM
|
||||
4. ✅ Entscheidung: Sync nötig? (Datei-Status, Hash-Check, Collections)
|
||||
5. ⏳ Download File von EspoCRM
|
||||
6. ⏳ Hash berechnen (MD5/SHA)
|
||||
7. ⏳ Thumbnail generieren
|
||||
8. ❌ Upload zu xAI (falls neu oder Hash changed)
|
||||
9. ❌ Add zu Collections
|
||||
10. ⏳ Update EspoCRM Metadaten (xaiFileId, xaiCollections, xaiSyncedHash, thumbnail)
|
||||
11. ✅ Lock release
|
||||
|
||||
### Datei-Stati in EspoCRM:
|
||||
|
||||
- **"Neu"**: Komplett neue Datei → xAI Upload + Collection Add
|
||||
- **"Geändert"**: File-Inhalt geändert → xAI Re-Upload + Collection Update
|
||||
- **"Gesynct"**: Erfolgreich gesynct, keine Änderungen
|
||||
- **"Fehler"**: Sync fehlgeschlagen (mit Error-Message)
|
||||
|
||||
### EspoCRM Custom Fields:
|
||||
|
||||
**Erforderlich für Document Entity:**
|
||||
- `dateiStatus` (Enum): "Neu", "Geändert", "Gesynct", "Fehler"
|
||||
- `md5` (String): MD5 Hash des Files
|
||||
- `sha` (String): SHA Hash des Files
|
||||
- `xaiFileId` (String): xAI File ID
|
||||
- `xaiCollections` (Array): JSON Array von Collection IDs
|
||||
- `xaiSyncedHash` (String): Hash beim letzten erfolgreichen Sync
|
||||
- `xaiSyncStatus` (Enum): "syncing", "synced", "failed"
|
||||
- `xaiSyncError` (Text): Fehlermeldung bei Sync-Fehler
|
||||
- `previewImage` (Attachment?): Vorschaubild
|
||||
|
||||
## 🚀 Nächste Schritte
|
||||
|
||||
**Priorität 1: xAI Service**
|
||||
- Code aus `test_xai_collections_api.py` extrahieren
|
||||
- In `services/xai_service.py` übertragen
|
||||
- EspoCRM Download-Funktion implementieren
|
||||
|
||||
**Priorität 2: Thumbnail-Generator**
|
||||
- Dependencies installieren
|
||||
- PDF-Thumbnail implementieren
|
||||
- EspoCRM Upload-Methode erweitern
|
||||
|
||||
**Priorität 3: Integration testen**
|
||||
- Document in EspoCRM anlegen
|
||||
- Datei-Status auf "Neu" setzen
|
||||
- Webhook triggern
|
||||
- Logs analysieren
|
||||
|
||||
## 📚 Referenzen
|
||||
|
||||
- **xAI API Tests**: `/opt/motia-iii/bitbylaw/test_xai_collections_api.py`
|
||||
- **EspoCRM API**: `services/espocrm.py`
|
||||
- **Beteiligte Sync** (Referenz-Implementierung): `steps/vmh/beteiligte_sync_event_step.py`
|
||||
Reference in New Issue
Block a user