How to Set Up Monitoring with Grafana and Prometheus

The observability setup that alerts you before your users open a support ticket.

Prometheus and Grafana are the de facto standard for open-source observability. This guide covers the installation metrics collection dashboard design and alerting rules that give you production visibility - not just a dashboard full of graphs nobody reads.

No fluff. Production-grade answers from engineers who build this every day.

The Four Golden Signals (And Why They're All You Need to Start)

Google's SRE book defines four golden signals: Latency Traffic Errors and Saturation. If you measure nothing else measure these four for every service. Latency: how long requests take (p50 p95 p99 - not average). Traffic: requests per second. Errors: error rate (HTTP 5xx percentage). Saturation: CPU and memory utilization queue depths. Alert on the signals that affect users not on every metric that can be measured.

At Valletta Software, we focus on:

kube-prometheus-stack: one Helm chart installs Prometheus Grafana Alertmanager exporters - start here

Service instrumentation: expose /metrics endpoint with prom-client (Node.js) or prometheus-client (Python)

Four golden signals: latency (p50/p95/p99) traffic (RPS) errors (5xx rate) saturation (CPU/mem)

Recording rules: pre-compute expensive queries - faster dashboards lower Prometheus load

Alertmanager: route alerts to Slack PagerDuty or email - separate channels by severity

Grafana dashboards: one service overview dashboard per service - avoid the mega-dashboard nobody reads

Labels: consistent labels across metrics (service env version) - enables cross-service correlation

The Alerting Strategy That Doesn't Cause Alert Fatigue

Alerts that fire constantly get ignored. These rules prevent it.

We give you more than just people. We give you top performers who drive results.

Alert on symptoms not causes: alert on error rate > 1% not on CPU > 70%
Severity levels: critical (wake someone up) warning (next business day) info (FYI)
Silence during deployment: suppress non-critical alerts during the deployment window
Runbook links: every alert links to a runbook - on-call engineer knows what to do
Error budget alerts: alert when error budget burn rate is high - SLO-based alerting
Dead man switch: alert when Prometheus itself stops scraping - catch monitoring failures
On-call rotation: alerts go to the right person - not a shared inbox nobody owns

Write boilerplate and scaffolding 3x faster with AI

Generate tests, migrations, and config automatically

Document architecture decisions as you build

Ship production-grade code - not just demos

How to Set Up Monitoring with Grafana and Prometheus - With Engineers Who Alert on What Matters

Our DevOps engineers deploy the kube-prometheus-stack instrument services with golden signal metrics configure symptom-based alerting and link every alert to a runbook.

Our engineers are trained in today's most powerful tools - Copilot, Claude, Cursor, and AI-assisted tooling - and use them daily to move faster without cutting corners.

Choose from a solo dev, mini team, or full squad. All powered by AI and ready to build from day one.

Let's keep it simple.

Our DevOps engineers deploy kube-prometheus-stack, instrument services with golden signal metrics, configure symptom-based alerting, and link every alert to a runbook.

Need This Done? Don't Build It Alone.

Our engineers have done this before - on real products, under real deadlines.

Free consultation • No commitment required • Response within 24 hours