π§± Observability Stack Architecture
This document outlines the architecture of a modern observability stack using:
- Prometheus for metrics
- Loki for logs
- Tempo for traces
- Grafana as the unified dashboard
This stack provides full visibility into your application's health, behavior, and performance.
π§ Overview
π Observability answers:
- What is happening? (Metrics)
- Why is it happening? (Logs)
- Where did it happen? (Traces)
π§ Core Components
Component | Purpose | Visualization |
---|---|---|
Prometheus | Scrapes metrics from applications | Grafana |
Loki | Collects and indexes logs | Grafana |
Tempo | Collects distributed traces | Grafana |
Grafana | Visualizes all of the above | Web UI |
𧬠Architecture Diagram
+--------------------------+
| Grafana Dashboards |
| (Metrics / Logs / Traces) |
+-----------+--------------+
|
+-----------------------------+-----------------------------+
| | |
+-------------+ +------------------+ +------------------+
| Prometheus | <---scrapes--- | FastAPI App | --traces--> | Tempo |
| (metrics) | | (via /metrics) | | (tracing backend) |
+-------------+ +------------------+ +------------------+
|
logs via stdout
|
+--------------+
| Promtail |
| (or Grafana Agent) |
+--------------+
|
+--------+
| Loki |
+--------+
βοΈ Data Flow Summary
-
Metrics (Prometheus):
- Your FastAPI app exposes metrics via
/metrics
(e.g., withprometheus_client
) - Prometheus scrapes them periodically
- Grafana queries and visualizes them
- Your FastAPI app exposes metrics via
-
Logs (Loki):
- Application logs (e.g.,
loguru
,structlog
,uvicorn
) go tostdout
- Promtail or Grafana Agent tails logs and sends them to Loki
- Logs are labeled (e.g., by service, pod, env) and searchable in Grafana
- Application logs (e.g.,
-
Traces (Tempo):
- FastAPI app is instrumented using OpenTelemetry
- Requests generate traces (including spans for DB, HTTP, etc.)
- Traces are exported via OTLP to Tempo
- Tempo stores and indexes traces for viewing in Grafana
π§© Component Integration
Tool | Input Source | Output / Integration |
---|---|---|
Prometheus | /metrics endpoint |
Grafana dashboards, Alerts |
Loki | Logs from stdout/stderr | Grafana Explore |
Tempo | OpenTelemetry SDK | Grafana trace viewer |
Grafana | Prometheus, Loki, Tempo | Unified view |
β Benefits of This Stack
- Single pane of glass: All telemetry in one UI
- Correlated insights: Link logs β traces β metrics
- Open source and cloud-native
- Minimal vendor lock-in (fully OSS or self-hostable)
π οΈ Example Use Case
Issue: Latency spike on
/api/orders
With this stack you can:
- Use Prometheus to see when latency increased
- Click a spike in Grafana β view Tempo trace
- Trace shows DB query took 1.2s
- Jump to Loki logs for same
trace_id
- See error log:
βIndex on orders.created_at missingβ
β You identified what, where, and why β across tools in seconds.