Observability: Does My App Work?


Core Principle

Balance: Not too much, not too little. Every metric and log must answer a real operational question.


Monitoring

The Right Amount

Use Icinga (as already deployed in your infrastructure) and focus on:

  • Where can we take action if something breaks?
  • Where is the signal-to-noise ratio actually worth it?

Question every metric: If this threshold fires, can we do something about it in 30 seconds? If not, it’s noise.


Metrics Stack

Prometheus — time-series database. Scrapes metrics from application endpoints.

Grafana — visualization layer. When setting up dashboards:

  • Study the Grafana marketplace — dashboards built by operators who know what they’re monitoring
  • Don’t replicate the whole system in one dashboard
  • Separate: health overview, live debugging, historical trends

Percona Monitoring and Management — specialized for relational databases

  • Covers PostgreSQL, MySQL, MariaDB, etc.
  • Ships with pre-built dashboards (don’t reinvent the wheel)
  • Gives you both standard metrics and expert heuristics

Logging & Tracing

Logging Pitfalls

Git Leaks: Run it over your logs to catch accidentally-logged credentials.

Problem: Pure log volume makes it hard to trace a request through the system.

Solution: Tracing beats logging

Add distributed traces so you can follow one request across service boundaries:

  • Traefik supports trace propagation natively (injects trace IDs into headers)
  • Caveat: your JMS integration may not cooperate — evaluate before committing

Client-side Observability — Tread Carefully

Sentry + OpenTracing for client logs — but watch out:

  • Privacy risk: Sending client data to external services requires user consent (GDPR, CCPA)
  • Security risk: If your metrics endpoint is publicly exposed, attackers can enumerate system details

Prefer internal collectors. If you must send to SaaS, audit what data is actually transmitted.


Tools & References

Tool Purpose Link
Icinga Alerting & health checks Already deployed
Prometheus Metrics storage & query https://prometheus.io
Grafana Dashboard visualization https://grafana.com
Percona PMM Database monitoring For relational DBs with preset dashboards
Traefik Trace propagation Built-in tracing support
Coroot Open-source APM https://github.com/coroot/coroot — lots of flexibility

Practical Checklist

  • Are metrics tied to runbook actions? (If no → delete metric)
  • Can your team read the dashboards? (Run a blind test)
  • Do traces flow across your service boundaries? (Test with a real request)
  • Did you audit what data leaves your network? (Privacy + security)