A monitoring solution for Docker hosts and containers with [Prometheus](https://prometheus.io/), [Grafana](http://grafana.org/), [cAdvisor](https://github.com/google/cadvisor), [NodeExporter](https://github.com/prometheus/node_exporter) and alerting with [AlertManager](https://github.com/prometheus/alertmanager).
_**If you're looking for the Docker Swarm version please go to [stefanprodan/swarmprom](https://github.com/stefanprodan/swarmprom)**_
Install
-------
Clone this repository on your Docker host, cd into dockprom directory and run compose up:
ADMIN\_USER=admin ADMIN\_PASSWORD=admin ADMIN\_PASSWORD\_HASH=JDJhJDE0JE91S1FrN0Z0VEsyWmhrQVpON1VzdHVLSDkyWHdsN0xNbEZYdnNIZm1pb2d1blg4Y09mL0ZP docker-compose up -d
**Caddy v2 does not accept plaintext passwords. It MUST be provided as a hash value. The above password hash corresponds to ADMIN\_PASSWORD 'admin'. To know how to generate hash password, refer [Updating Caddy to v2](https://github.com/#Updating-Caddy-to-v2)**
Prerequisites:
* Docker Engine >= 1.13
* Docker Compose >= 1.11
Updating Caddy to v2
--------------------
Perform a `docker run --rm caddy caddy hash-password --plaintext 'ADMIN_PASSWORD'` in order to generate a hash for your new password. ENSURE that you replace `ADMIN_PASSWORD` with new plain text password and `ADMIN_PASSWORD_HASH` with the hashed password references in [docker-compose.yml](https://github.com/stefanprodan/dockprom/blob/master/docker-compose.yml) for the caddy container.
* Caddy (reverse proxy and basic auth provider for prometheus and alertmanager)
Setup Grafana
-------------
Navigate to `http://<host-ip>:3000` and login with user _**admin**_ password _**admin**_. You can change the credentials in the compose file or by supplying the `ADMIN_USER` and `ADMIN_PASSWORD` environment variables on compose up. The config file can be added directly in grafana part like this
grafana:
image: grafana/grafana:7.2.0
env\_file:
- config
and the config file format should have this content
GF\_SECURITY\_ADMIN\_USER=admin
GF\_SECURITY\_ADMIN\_PASSWORD=changeme
GF\_USERS\_ALLOW\_SIGN\_UP=false
If you want to change the password, you have to remove this entry, otherwise the change will not take effect
\- grafana\_data:/var/lib/grafana
Grafana is preconfigured with dashboards and Prometheus as the default data source:
The Docker Host Dashboard shows key metrics for monitoring the resource usage of your server:
* Server uptime, CPU idle percent, number of CPU cores, available memory, swap and storage
* System load average graph, running and blocked by IO processes graph, interrupts graph
* CPU usage graph by mode (guest, idle, iowait, irq, nice, softirq, steal, system, user)
* Memory usage graph by distribution (used, free, buffers, cached)
* IO usage graph (read Bps, read Bps and IO time)
* Network usage graph by device (inbound Bps, Outbound Bps)
* Swap usage and activity graphs
For storage and particularly Free Storage graph, you have to specify the fstype in grafana graph request. You can find it in `grafana/provisioning/dashboards/docker_host.json`, at line 480 :
Three alert groups have been setup within the [alert.rules](https://github.com/stefanprodan/dockprom/blob/master/prometheus/alert.rules) configuration file:
You can modify the alert rules and reload them by making a HTTP POST call to Prometheus:
curl -X POST http://admin:admin@<host-ip\>:9090/-/reload
_**Monitoring services alerts**_
Trigger an alert if any of the monitoring targets (node-exporter and cAdvisor) are down for more than 30 seconds:
\- alert: monitor\_service\_down
expr: up == 0
for: 30s
labels:
severity: critical
annotations:
summary: "Monitor service non-operational"
description: "Service {{ $labels.instance }} is down."
_**Docker Host alerts**_
Trigger an alert if the Docker host CPU is under high load for more than 30 seconds:
\- alert: high\_cpu\_load
expr: node\_load1 > 1.5
for: 30s
labels:
severity: warning
annotations:
summary: "Server under high load"
description: "Docker host is under high load, the avg load 1m is at {{ $value}}. Reported by instance {{ $labels.instance }} of job {{ $labels.job }}."
Modify the load threshold based on your CPU cores.
Trigger an alert if the Docker host memory is almost full:
description: "Jenkins memory consumption is at {{ humanize $value}}."
Setup alerting
--------------
The AlertManager service is responsible for handling alerts sent by Prometheus server. AlertManager can send notifications via email, Pushover, Slack, HipChat or any other system that exposes a webhook interface. A complete list of integrations can be found [here](https://prometheus.io/docs/alerting/configuration).
You can view and silence notifications by accessing `http://<host-ip>:9093`.
The notification receivers can be configured in [alertmanager/config.yml](https://github.com/stefanprodan/dockprom/blob/master/alertmanager/config.yml) file.
To receive alerts via Slack you need to make a custom integration by choose _**incoming web hooks**_ in your Slack team app page. You can find more details on setting up Slack integration [here](http://www.robustperception.io/using-slack-with-the-alertmanager/).
Copy the Slack Webhook URL into the _**api\_url**_ field and specify a Slack _**channel**_.
Please replace the `user:password` part with your user and password set in the initial configuration (default: `admin:admin`).
Updating Grafana to v5.2.2
--------------------------
[In Grafana versions >= 5.1 the id of the grafana user has been changed](http://docs.grafana.org/installation/docker/#migration-from-a-previous-version-of-the-docker-container-to-5-1-or-later). Unfortunately this means that files created prior to 5.1 won’t have the correct permissions for later versions.
| Version | User | User ID |
| --- | --- | --- |
| <5.1|grafana|104|
| \>= 5.1 | grafana | 472 |
There are two possible solutions to this problem.
1. Change ownership from 104 to 472
2. Start the upgraded container as user 104
Specifying a user in docker-compose.yml
---------------------------------------
To change ownership of the files run your grafana container as root and modify the permissions.
First perform a `docker-compose down` then modify your docker-compose.yml to include the `user: root` option: