Observability: Does My App Work?
Core Principle
Balance: Not too much, not too little. Every metric and log must answer a real operational question.
Monitoring
The Right Amount
Use Icinga (as already deployed in your infrastructure) and focus on:
- Where can we take action if something breaks?
- Where is the signal-to-noise ratio actually worth it?
Question every metric: If this threshold fires, can we do something about it in 30 seconds? If not, it’s noise.
Metrics Stack
Prometheus — time-series database. Scrapes metrics from application endpoints.
Grafana — visualization layer. When setting up dashboards:
- Study the Grafana marketplace — dashboards built by operators who know what they’re monitoring
- Don’t replicate the whole system in one dashboard
- Separate: health overview, live debugging, historical trends
Percona Monitoring and Management — specialized for relational databases
- Covers PostgreSQL, MySQL, MariaDB, etc.
- Ships with pre-built dashboards (don’t reinvent the wheel)
- Gives you both standard metrics and expert heuristics
Logging & Tracing
Logging Pitfalls
Git Leaks: Run it over your logs to catch accidentally-logged credentials.
Problem: Pure log volume makes it hard to trace a request through the system.
Solution: Tracing beats logging
Add distributed traces so you can follow one request across service boundaries:
- Traefik supports trace propagation natively (injects trace IDs into headers)
- Caveat: your JMS integration may not cooperate — evaluate before committing
Client-side Observability — Tread Carefully
Sentry + OpenTracing for client logs — but watch out:
- Privacy risk: Sending client data to external services requires user consent (GDPR, CCPA)
- Security risk: If your metrics endpoint is publicly exposed, attackers can enumerate system details
Prefer internal collectors. If you must send to SaaS, audit what data is actually transmitted.
Tools & References
| Tool | Purpose | Link |
|---|---|---|
| Icinga | Alerting & health checks | Already deployed |
| Prometheus | Metrics storage & query | https://prometheus.io |
| Grafana | Dashboard visualization | https://grafana.com |
| Percona PMM | Database monitoring | For relational DBs with preset dashboards |
| Traefik | Trace propagation | Built-in tracing support |
| Coroot | Open-source APM | https://github.com/coroot/coroot — lots of flexibility |
Practical Checklist
- Are metrics tied to runbook actions? (If no → delete metric)
- Can your team read the dashboards? (Run a blind test)
- Do traces flow across your service boundaries? (Test with a real request)
- Did you audit what data leaves your network? (Privacy + security)