为什么用Docker部署AI应用?
Docker容器化是部署AI应用的最佳实践。它解决了”在我机器上能跑”的经典问题,确保环境一致性,简化部署流程,并支持弹性扩展。本文将带你从零开始,用Docker部署一个完整的AI应用。
环境准备
# 安装Docker
curl -fsSL https://get.docker.com | sh
# 安装NVIDIA Container Toolkit(GPU支持)
distribution=ubuntu24.04
curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add -
curl -s -L https://nvidia.github.io/nvidia-docker//nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list
sudo apt-get update
sudo apt-get install -y nvidia-container-toolkit
# 验证GPU支持
docker run --rm --gpus all nvidia/cuda:12.1-base nvidia-smi
项目一:本地LLM服务(Ollama + WebUI)
Docker Compose配置
# docker-compose.yml
version: '3.8'
services:
ollama:
image: ollama/ollama:latest
ports:
- "11434:11434"
volumes:
- ollama_data:/root/.ollama
deploy:
resources:
reservations:
devices:
- capabilities: [gpu]
restart: unless-stopped
open-webui:
image: ghcr.io/open-webui/open-webui:main
ports:
- "3000:8080"
environment:
- OLLAMA_API_BASE_URL=http://ollama:11434
volumes:
- webui_data:/app/backend/data
depends_on:
- ollama
restart: unless-stopped
volumes:
ollama_data:
webui_data:
启动服务
# 启动
docker compose up -d
# 拉取模型(在ollama容器中执行)
docker exec -it ollama-ollama-1 ollama pull llama3.2:3b
docker exec -it ollama-ollama-1 ollama pull qwen2.5:7b
# 访问WebUI
# 打开浏览器:http://localhost:3000
项目二:RAG知识库系统
架构设计
┌─────────────┐ ┌─────────────┐ ┌─────────────┐
│ FastAPI │────▶│ Qdrant │ │ Ollama │
│ (应用层) │ │ (向量库) │ │ (LLM) │
└─────────────┘ └─────────────┘ └─────────────┘
Docker Compose配置
version: '3.8'
services:
app:
build: ./app
ports:
- "8000:8000"
environment:
- QDRANT_URL=http://qdrant:6333
- OLLAMA_URL=http://ollama:11434
depends_on:
- qdrant
- ollama
qdrant:
image: qdrant/qdrant:latest
ports:
- "6333:6333"
volumes:
- qdrant_data:/qdrant/storage
ollama:
image: ollama/ollama:latest
ports:
- "11434:11434"
volumes:
- ollama_data:/root/.ollama
deploy:
resources:
reservations:
devices:
- capabilities: [gpu]
volumes:
qdrant_data:
ollama_data:
应用代码(FastAPI + RAG)
# app/main.py
from fastapi import FastAPI
from qdrant_client import QdrantClient
import ollama
app = FastAPI()
qdrant = QdrantClient(url="http://qdrant:6333")
@app.post("/upload")
async def upload_document(file: str):
"""上传文档并生成向量嵌入"""
# 1. 文本分块
chunks = split_text(file)
# 2. 生成嵌入
embeddings = []
for chunk in chunks:
resp = ollama.embeddings(model="nomic-embed-text", prompt=chunk)
embeddings.append(resp["embedding"])
# 3. 存入Qdrant
qdrant.upsert(
collection_name="docs",
points=[
{"id": i, "vector": emb, "payload": {"text": txt}}
for i, (emb, txt) in enumerate(zip(embeddings, chunks))
]
)
return {"status": "ok", "chunks": len(chunks)}
@app.post("/query")
async def query(question: str):
"""RAG问答"""
# 1. 检索
q_emb = ollama.embeddings(model="nomic-embed-text", prompt=question)["embedding"]
results = qdrant.query_points(collection_name="docs", query=q_emb, limit=5)
# 2. 构建上下文
context = "\n".join([p.payload["text"] for p in results.points])
# 3. 生成回答
resp = ollama.chat(
model="qwen2.5:7b",
messages=[{"role": "user", "content": f"基于以下信息回答:\n{context}\n\n问题:{question}"}]
)
return {"answer": resp["message"]["content"]}
Dockerfile
FROM python:3.10-slim
WORKDIR /app
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
COPY . .
CMD ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "8000"]
项目三:AI图片生成服务
# docker-compose.yml
version: '3.8'
services:
stable-diffusion:
build: ./sd-service
ports:
- "7860:7860"
volumes:
- models:/models
deploy:
resources:
reservations:
devices:
- capabilities: [gpu]
environment:
- WEBUI_ARGS=--api --nowebui
volumes:
models:
部署最佳实践
- 使用.dockerignore:排除不必要的文件,减小镜像体积
- 多阶段构建:减小最终镜像大小
- 健康检查:配置healthcheck确保服务可用
- 日志管理:配置日志轮转,避免磁盘占满
- 资源限制:设置CPU和内存限制
监控与运维
# 查看容器状态
docker compose ps
# 查看日志
docker compose logs -f app
# 更新模型
docker exec -it ollama-ollama-1 ollama pull new-model
# 备份数据
docker run --rm -v ollama_data:/data -v /root/.openclaw/workspace:/backup alpine tar czf /backup/ollama-backup.tar.gz /data
通过Docker容器化部署AI应用,你可以轻松实现环境一致性、快速部署和弹性扩展,是生产环境部署AI应用的标准做法。
虾米生活分享

评论前必须登录!
注册