E2E encrypted chat (X3DH + Double Ratchet, Signal Protocol). Server: asyncio TCP + TLS, MySQL. Clients: PyQt6 GUI + CLI. Secrets (.env, TLS keys, Cloudflare token), runtime data and mobile clients (separate repos) are gitignored. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
253 lines
7.0 KiB
Markdown
253 lines
7.0 KiB
Markdown
# Škálování serveru — plán kapacitního růstu
|
||
|
||
## Cílový hardware
|
||
|
||
- **CPU:** Intel Xeon E5-2630v4 (10 cores / 20 threads, 2.2 GHz)
|
||
- **RAM:** 256 GB REG ECC
|
||
- **Disk:** 500 GB SSD (boot/OS/DB) + 4 TB HDD (soubory)
|
||
- **Síť:** 1 Gbit
|
||
|
||
Odhadovaná kapacita po optimalizaci: **10 000–20 000 uživatelů**, **2000–5000 zpráv/s**
|
||
|
||
---
|
||
|
||
## Krok 1: Okamžité změny (hotovo v kódu)
|
||
|
||
### 1a. Thread pool — `server.py`
|
||
|
||
```env
|
||
THREAD_POOL_SIZE=40
|
||
```
|
||
|
||
Nastavuje `ThreadPoolExecutor(max_workers=40)` jako default executor pro `asyncio.to_thread()`.
|
||
S 20 HW thready a DB latencí ~2–5ms je 40 workerů optimální (2x HW threads — workery čekají na I/O).
|
||
|
||
### 1b. DB pool — `.env`
|
||
|
||
```env
|
||
DB_POOL_SIZE=30
|
||
```
|
||
|
||
30 simultánních MySQL spojení. S 40 thread workers a ~2ms query je 30 pool konexí dostatek.
|
||
|
||
### 1c. Chybějící DB indexy — `schema.sql`
|
||
|
||
Přidány 5 nových indexů pro nejčastější dotazy:
|
||
|
||
| Index | Tabulka | Dotaz který zrychlí |
|
||
|-------|---------|---------------------|
|
||
| `idx_cm_user (user_id)` | `conversation_members` | `list_user_conversations` — **kritický**, bez něj full table scan |
|
||
| `idx_inv_user (user_id)` | `group_invitations` | `get_pending_invitations` |
|
||
| `idx_messages_deleted (conversation_id, deleted_at)` | `messages` | `get_deleted_messages_since` |
|
||
| `idx_messages_pinned (conversation_id, pinned_at)` | `messages` | `get_pinned_messages` |
|
||
| `idx_reads_user (user_id)` | `message_reads` | `get_unread_counts` |
|
||
|
||
**SQL migrace pro existující databázi:**
|
||
|
||
```sql
|
||
ALTER TABLE conversation_members ADD INDEX idx_cm_user (user_id);
|
||
ALTER TABLE group_invitations ADD INDEX idx_inv_user (user_id);
|
||
ALTER TABLE messages ADD INDEX idx_messages_deleted (conversation_id, deleted_at);
|
||
ALTER TABLE messages ADD INDEX idx_messages_pinned (conversation_id, pinned_at);
|
||
ALTER TABLE message_reads ADD INDEX idx_reads_user (user_id);
|
||
```
|
||
|
||
### 1d. Upload adresář na HDD
|
||
|
||
```env
|
||
UPLOAD_DIR=/mnt/hdd/encrypted_chat/uploads
|
||
```
|
||
|
||
Šifrované soubory a avatary na 4TB HDD — SSD zůstane pro OS a MySQL data.
|
||
|
||
```bash
|
||
mkdir -p /mnt/hdd/encrypted_chat/uploads
|
||
chmod 700 /mnt/hdd/encrypted_chat/uploads
|
||
```
|
||
|
||
---
|
||
|
||
## Krok 2: MySQL tuning pro 256 GB RAM
|
||
|
||
### `/etc/mysql/mysql.conf.d/tuning.cnf` (nebo ekvivalent v Dockeru)
|
||
|
||
```ini
|
||
[mysqld]
|
||
# === Buffer Pool — hlavní cache pro data + indexy ===
|
||
# 96 GB = ~37% RAM (MySQL + app na stejném stroji)
|
||
innodb_buffer_pool_size = 96G
|
||
innodb_buffer_pool_instances = 16
|
||
|
||
# === Redo Log — větší = méně I/O, rychlejší zápisy ===
|
||
innodb_redo_log_capacity = 4G
|
||
|
||
# === Flush strategie ===
|
||
# 2 = flush do OS cache každou sekundu (ne každý commit)
|
||
# Ztráta max 1s dat při pádu OS, ale 10x rychlejší zápisy
|
||
innodb_flush_log_at_trx_commit = 2
|
||
# O_DIRECT = bypass OS page cache (InnoDB má vlastní)
|
||
innodb_flush_method = O_DIRECT
|
||
|
||
# === I/O kapacita (SSD) ===
|
||
innodb_io_capacity = 2000
|
||
innodb_io_capacity_max = 4000
|
||
|
||
# === Connections ===
|
||
max_connections = 200
|
||
|
||
# === Sort/Join buffery ===
|
||
sort_buffer_size = 4M
|
||
join_buffer_size = 4M
|
||
read_buffer_size = 2M
|
||
read_rnd_buffer_size = 2M
|
||
|
||
# === Temporary tables ===
|
||
tmp_table_size = 256M
|
||
max_heap_table_size = 256M
|
||
|
||
# === Query cache (MySQL 8.0+ nemá, pro 5.7) ===
|
||
# query_cache_type = 0
|
||
|
||
# === Thread cache ===
|
||
thread_cache_size = 64
|
||
|
||
# === Binary logging (pro budoucí repliky) ===
|
||
# server-id = 1
|
||
# log_bin = /var/log/mysql/mysql-bin
|
||
# binlog_expire_logs_seconds = 604800
|
||
# max_binlog_size = 256M
|
||
```
|
||
|
||
**Pokud MySQL běží v Dockeru:**
|
||
|
||
```yaml
|
||
# docker-compose.yml
|
||
services:
|
||
mysql:
|
||
image: mysql:8.0
|
||
volumes:
|
||
- /var/lib/mysql:/var/lib/mysql # data na SSD
|
||
- ./tuning.cnf:/etc/mysql/conf.d/tuning.cnf
|
||
deploy:
|
||
resources:
|
||
limits:
|
||
memory: 128G # limitovat aby zbylo pro app
|
||
environment:
|
||
MYSQL_DATABASE: encrypted_chat
|
||
```
|
||
|
||
### Po aplikaci restartovat MySQL a ověřit:
|
||
|
||
```sql
|
||
SHOW VARIABLES LIKE 'innodb_buffer_pool_size';
|
||
SHOW VARIABLES LIKE 'innodb_flush_log_at_trx_commit';
|
||
SHOW ENGINE INNODB STATUS\G
|
||
```
|
||
|
||
---
|
||
|
||
## Krok 3: Doporučená `.env` pro produkci
|
||
|
||
```env
|
||
# Server
|
||
SERVER_HOST=0.0.0.0
|
||
SERVER_PORT=9999
|
||
|
||
# MySQL
|
||
MYSQL_HOST=127.0.0.1
|
||
MYSQL_PORT=3306
|
||
MYSQL_USER=sifrator
|
||
MYSQL_PASSWORD=<silne-heslo>
|
||
MYSQL_DATABASE=encrypted_chat
|
||
DB_POOL_SIZE=30
|
||
|
||
# Performance
|
||
THREAD_POOL_SIZE=40
|
||
|
||
# Storage
|
||
UPLOAD_DIR=/mnt/hdd/encrypted_chat/uploads
|
||
|
||
# TLS (zapnout pro produkci)
|
||
TLS_ENABLED=true
|
||
TLS_CERT_FILE=/etc/letsencrypt/live/chat.example.com/fullchain.pem
|
||
TLS_KEY_FILE=/etc/letsencrypt/live/chat.example.com/privkey.pem
|
||
|
||
# Logging
|
||
LOG_LEVEL=INFO
|
||
```
|
||
|
||
---
|
||
|
||
## Krok 4: Monitoring (doporučeno)
|
||
|
||
### Jednoduché metriky bez externích nástrojů
|
||
|
||
Přidat do serveru periodické logování:
|
||
|
||
```python
|
||
# V _periodic_cleanup() (každých 10 min):
|
||
async with _clients_lock:
|
||
total_connections = sum(len(v) for v in connected_clients.values())
|
||
unique_users = len(connected_clients)
|
||
logger.info("[STATS] users=%d connections=%d", unique_users, total_connections)
|
||
```
|
||
|
||
### S externími nástroji (volitelně)
|
||
|
||
- **htop** — CPU / RAM využití procesu
|
||
- **mysqladmin status** — queries/s, slow queries, connections
|
||
- **Prometheus + Grafana** — dlouhodobé trendy (přidat až při potřebě)
|
||
|
||
---
|
||
|
||
## Budoucí škálování
|
||
|
||
### Fáze A: Separace MySQL (15K+ uživatelů)
|
||
|
||
MySQL na separátní stroj (nebo managed DB). App server + Redis na jednom, DB na druhém.
|
||
|
||
```
|
||
[Server: App + Redis] ──TCP──▶ [Server: MySQL]
|
||
│
|
||
└──▶ [HDD/S3: soubory]
|
||
```
|
||
|
||
### Fáze B: Horizontální škálování (50K+ uživatelů)
|
||
|
||
Více app serverů za load balancerem + Redis Pub/Sub pro cross-server notifikace.
|
||
|
||
```
|
||
┌─── App server 1 ───┐
|
||
Client ──▶ │ connected_clients │──┐
|
||
└─────────────────────┘ │
|
||
├──▶ Redis Pub/Sub ──▶ MySQL
|
||
┌─── App server 2 ───┐ │
|
||
Client ──▶ │ connected_clients │──┘
|
||
└─────────────────────┘
|
||
▲
|
||
Load Balancer (HAProxy / nginx stream)
|
||
(sticky sessions by user_id)
|
||
```
|
||
|
||
Hlavní změna: `_notify_users()` posílá do Redis místo lokálního `connected_clients` pokud uživatel není na tomto serveru.
|
||
|
||
### Fáze C: DB škálování (100K+ uživatelů)
|
||
|
||
- Read replicas pro SELECT dotazy
|
||
- Partitioning tabulky `messages` podle měsíce
|
||
- Sharding podle `conversation_id`
|
||
|
||
---
|
||
|
||
## Přehled — co je hotovo
|
||
|
||
| Krok | Stav | Popis |
|
||
|------|------|-------|
|
||
| asyncio.to_thread() pro DB | **Hotovo** | 131 DB volání offloadováno do thread poolu |
|
||
| ThreadPoolExecutor(40) | **Hotovo** | Konfigurovatelný přes `THREAD_POOL_SIZE` |
|
||
| DB indexy (5 nových) | **Hotovo** | Schema + SQL migrace připraveny |
|
||
| UPLOAD_DIR na HDD | **Konfigurace** | Nastavit v `.env` |
|
||
| MySQL tuning | **Konfigurace** | Aplikovat `tuning.cnf` |
|
||
| TLS certifikát | **TODO** | Let's Encrypt nebo vlastní CA |
|
||
| Monitoring | **Volitelné** | Periodické logování stats |
|