You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

1029 lines
21 KiB

This file contains invisible Unicode characters!

This file contains invisible Unicode characters that may be processed differently from what appears below. If your use case is intentional and legitimate, you can safely ignore this warning. Use the Escape button to reveal hidden characters.

This file contains ambiguous Unicode characters that may be confused with others in your current locale. If your use case is intentional and legitimate, you can safely ignore this warning. Use the Escape button to highlight these characters.

---
Alias: ["Prometheus"]
Tag: ["Computer", "Server", "Monitoring"]
Date: 2022-03-19
DocType: "Personal"
Hierarchy: "NonRoot"
TimeStamp:
location: [47.3639129,8.55627491017841]
CollapseMetaTable: Yes
---
Parent:: [[Selfhosting]], [[Configuring Caddy|caddy]], [[Server Tools]]
---
 
^Top
```button
name Save
type command
action Save current file
id Save
```
^button-ConfiguringPrometheusNSave
 
# Configuring Prometheus
 
```ad-abstract
title: Summary
collapse: open
This not runs through the installation and use of Prometheus as a monitoring tool.
Prometheus interacts better with json logs rather than common log language, which is caddy's output.
```
 
```toc
style: number
```
 
---
 
### Introduction
[[#^Top|TOP]]
 
[Prometheus](https://prometheus.io/docs/introduction/overview/) is a free and open-source monitoring and alerting tool that was initially used for monitoring metrics at SoundCloud back in 2012. It is written in Go programming language.
Prometheus monitors and records real-time events in a time-series database. Since then it has grown in leaps and bounds and had been adopted by many organizations to monitor their infrastructure metrics. Prometheus provides flexible queries and real-time alerting which helps in quick diagnosis and troubleshooting of errors.
Prometheus comprises the following major components:
- The main Prometheus server for scraping and storing time-series data.
- Unique exporters for services such as Graphite, HAProxy, StatsD and so much more
- An alert manager for handling alerts
- A push-gateway for supporting transient jobs
- Client libraries for instrumenting application code
 
---
 
### Installing Prometheus
[[#^Top|TOP]]
 
#### Installing the main modules
But first, we need to create the configuration and data directories for Prometheus.
To create the configuration directory, run the command:
```ad-command
~~~bash
sudo mkdir -p /etc/prometheus
~~~
```
 
For the data directory, execute:
```ad-command
~~~bash
sudo mkdir -p /var/lib/prometheus
~~~
```
 
Once the directories are created, grab the compressed installation file:
```ad-command
~~~bash
wget https://github.com/prometheus/prometheus/releases/download/v2.31.0/prometheus-2.31.0.linux-amd64.tar.gz
~~~
```
 
Once downloaded, extract the tarball file.
```ad-command
~~~bash
tar -xvf prometheus-2.31.3.linux-amd64.tar.gz
~~~
```
 
Then navigate to the Prometheus folder.
```ad-command
~~~bash
cd prometheus-2.31.3.linux-amd64
~~~
```
 
Once in the [directory move](https://linoxide.com/mv-command-in-linux/) the  `prometheus` and `promtool` binary files to `/usr/local/bin/` folder.
```ad-command
~~~bash
sudo mv prometheus promtool /usr/local/bin/
~~~
```
 
Additionally, move console files in `console` directory and library files in the `console_libraries`  directory to `/etc/prometheus/` directory.
```ad-command
~~~bash
sudo mv consoles/ console_libraries/ /etc/prometheus/
~~~
```
 
Also, ensure to move the prometheus.yml template configuration file to the  **`/etc/prometheus/`** directory.
```ad-command
~~~bash
sudo mv prometheus.yml /etc/prometheus/prometheus.yml
~~~
```
 
At this point, Prometheus has been successfully installed. To check the version of Prometheus installed, run the command:
```ad-command
~~~bash
prometheus --version
~~~
```
 
Output:
```ad-code
~~~bash
prometheus, version 2.31.3 (branch: HEAD, revision: f29caccc42557f6a8ec30ea9b3c8c089391bd5df)
build user: root@5cff4265f0e3
build date: 20211005-16:10:52
go version: go1.17.1
platform: linux/amd64
~~~
```
 
```ad-command
~~~bash
promtool --version
~~~
```
 
Output:
```ad-code
~~~bash
promtool, version 2.31.3 (branch: HEAD, revision: f29caccc42557f6a8ec30ea9b3c8c089391bd5df)
build user: root@5cff4265f0e3
build date: 20211005-16:10:52
go version: go1.17.1
platform: linux/amd64
~~~
```
If your output resembles what I have, then you are on the right track. In the next step, we will create a system group and user.
 
#### Permissions & User Management
[[#^Top|TOP]]
It's essential that we create a Prometheus group and user before proceeding to the next step which involves creating a system file for Prometheus.
To  create a `prometheus` [group](https://linoxide.com/groupadd-command/) execute the command:
```ad-command
~~~bash
sudo groupadd --system prometheus
~~~
```
 
Thereafter, Create `prometheus` user and assign it to the just-created `prometheus` group.
```ad-command
~~~bash
sudo useradd -s /sbin/nologin --system -g prometheus prometheus
~~~
```
 
Next, configure the directory ownership and permissions as follows.
```ad-command
~~~bash
sudo chown -R prometheus:prometheus /etc/prometheus/ /var/lib/prometheus/$ sudo chmod -R 775 /etc/prometheus/ /var/lib/prometheus/
~~~
```
The only part remaining is to make Prometheus a systemd service so that we can easily manage its running status.
 
#### Configuring the service
[[#^Top|TOP]]
Using your favorite text editor, create a systemd service file:
```ad-command
~~~bash
sudo nano /etc/systemd/system/prometheus.service
~~~
```
 
Paste the following lines of code.
```ad-code
~~~bash
[Unit]
Description=Prometheus
Wants=network-online.target
After=network-online.target
[Service]
User=prometheus
Group=prometheus
Restart=always
Type=simple
ExecStart=/usr/local/bin/prometheus \
--config.file=/etc/prometheus/prometheus.yml \
--storage.tsdb.path=/var/lib/prometheus/ \
--web.console.templates=/etc/prometheus/consoles \
--web.console.libraries=/etc/prometheus/console_libraries \
--web.listen-address=0.0.0.0:9090
[Install]
WantedBy=multi-user.target
~~~
```
Save the changes and exit the systemd file.
Then proceed and start the Prometheus service.
```ad-command
~~~bash
sudo systemctl start prometheus
~~~
```
 
Enable the Prometheus service to run at startup. Therefore invoke the command:
```ad-command
~~~bash
sudo systemctl enable prometheus
~~~
```
 
Then confirm the status of the Prometheus service.
```ad-command
~~~bash
sudo systemctl status prometheus
~~~
```
![Check status of Prometheus services](https://linoxide.com/wp-content/uploads/2021/11/2021-10-1003-Check-status-of-Prometheus-services.png)![Check status of Prometheus services](https://linoxide.com/wp-content/uploads/2021/11/2021-10-1003-Check-status-of-Prometheus-services.png)
 
#### Configuration of user acccess
[[#^Top|TOP]]
Finally, to access Prometheus, parameter your reverse-proxy ([[Configuring Caddy|caddy]]) to point back to the service.
It is accessible below, under internal port 9090:
```ad-address
https://prometheus.mfxm.fr
```
 
![prometheus dashboard](https://linoxide.com/wp-content/uploads/2021/11/2021-10-1003-Prometheus-dashboard-1024x440.png)![prometheus dashboard](https://linoxide.com/wp-content/uploads/2021/11/2021-10-1003-Prometheus-dashboard-1024x440.png)
 
---
 
### Configuring alerts
[[#^Top|TOP]]
 
#### Install Alertmanager
Download the latest version of Alert Manager (v0.23.0 at the time of this writing) with the following command:
```ad-command
~~~bash
wget https://github.com/prometheus/alertmanager/releases/download/v0.23.0/alertmanager-0.23.0.linux-amd64.tar.gz
~~~
```
 
Alert Manager is being downloaded. It may take a while to complete.
At this point, Alert Manager should be downloaded.
Once Alert Manager is downloaded, you should find a new archive file **alertmanager-0.23.0.linux-amd64.tar.gz** in your current working directory.
Extract the **alertmanager-0.23.0.linux-amd64.tar.gz** archive with the following command:
```ad-command
~~~bash
tar xzf alertmanager-0.22.2.linux-amd64.tar.gz
~~~
```
 
You should find a new directory **alertmanager-0.23.0.linux-amd64/** as marked in the screenshot below.
Now, move the **alertmanager-0.23.0.linux-amd64** directory to **/opt/** directory and rename it to **alertmanager** as follows:
```ad-command
~~~bash
sudo mv -v alertmanager-0.23.0.linux-amd64 /opt/alertmanager
~~~
```
 
Change the user and group of all the files and directories of the `/opt/alertmanager/` directory to root as follows:
```ad-command
~~~bash
sudo chown -Rfv root:root /opt/alertmanager
~~~
```
 
In the **/opt/alertmanager** directory, you should find the **alertmanager** binary and the Alert Manager configuration file **alertmanager.yml**. You will use them later. So, just keep that in mind.
 
#### Creating a Data Directory
[[#^Top|TOP]]
Alert Manager needs a directory where it can store its data. As you will be running Alert Manager as the **prometheus** system user, the **prometheus** system user must have access (read, write, and execute permissions) to that data directory.
You can create the **data/** directory in the **/opt/alertmanager/** directory as follows:
```ad-command
~~~bash
sudo mkdir -v /opt/alertmanager/data
~~~
```
 
Change the owner and group of the **/opt/alertmanager/data/** directory to **prometheus** with the following command:
```ad-command
~~~bash
sudo chown -Rfv prometheus:prometheus /opt/alertmanager/data
~~~
```
 
The owner and group of the **/opt/alertmanager/data/** directory should be changed to **prometheus**.
 
#### Starting Alert Manager on Boot
[[#^Top|TOP]]
Now, you have to create a systemd service file for Alert Manager so that you can easily manage (start, stop, restart, and add to startup) the alertmanager service with systemd.
To create a systemd service file **alertmanager.service**, run the following command:
```ad-command
~~~bash
sudo nano /etc/systemd/system/alertmanager.service
~~~
```
 
Type in the following lines in the **alertmanager.service** file.
```ad-code
~~~bash
[Unit]
Description=Alertmanager for prometheus
[Service]
Restart=always
User=prometheus
ExecStart=/opt/alertmanager/alertmanager --config.file=/opt/alertmanager/alertmanager.yml --storage.path=/opt/alertmanager/data            
ExecReload=/bin/kill -HUP $MAINPID
TimeoutStopSec=20s
SendSIGKILL=no
[Install]
WantedBy=multi-user.target
~~~
```
 
For the systemd changes to take effect, run the following command:
```ad-command
~~~bash
sudo systemctl daemon-reload
~~~
```
 
Now, start the **alertmanager** service with the following command:
```ad-command
~~~bash
sudo systemctl start alertmanager.service
~~~
```
 
Add the **alertmanager** service to the system startup so that it automatically starts on boot with the following command:
```ad-command
~~~bash
sudo systemctl enable alertmanager.service
~~~
```
 
As you can see, the **alertmanager** service is **active/running**. It is also **enabled** (it will start automatically on boot).
```ad-command
~~~bash
sudo systemctl status alertmanager.service
~~~
```
 
#### Configuring Prometheus
[[#^Top|TOP]]
Now, you have to configure Prometheus to use Alert Manager. You can also monitor Alert Manager with Prometheus. I will show you how to do both in this section.
First, find the IP address of the computer where you have installed Alert Manager with the following command:
```ad-command
~~~bash
hostname -I
~~~
```
 
Now, open the Prometheus configuration file **/opt/prometheus/prometheus.yml** with the **nano** text editor as follows:
```ad-command
~~~bash
sudo nano /etc/prometheus/prometheus.yml
~~~
```
 
Type in the following lines in the **scrape_configs** section to add Alert Manager for monitoring with Prometheus.
```ad-code
~~~bash
- job_name: 'alertmanager'
  static_configs:
  - targets: ['localhost:9093']
~~~
```
 
Also, type in the IP address and port number of Alert Manager in the **alerting > alertmanagers** section.
For the changes to take effect, restart the **prometheus** service as follows:
```ad-command
~~~bash
sudo systemctl restart prometheus
~~~
```
 
Visit the URL [http://192.168.20.161:9090/targets](http://192.168.20.161:9090/targets) from your favorite web browser, and you should see that **alertmanager** is in the **UP** state. So, Prometheus can access Alert Manager just fine.
 
#### Creating a Prometheus Alert Rule
[[#^Top|TOP]]
On Prometheus, you can use the **up** expression to find the state of the targets added to Prometheus, as shown in the screenshot below.
The targets that are in the **UP** state (running and accessible to Prometheus) will have the value **1**, and targets that are not in the **UP** (or **DOWN**) state (not running or inaccessible to Prometheus) will have the value **0**.
If you stop one of the targets **node_exporter** (lets say).
```ad-command
~~~bash
sudo systemctl stop node-exporter.service
~~~
```
 
The **up** value of that target should be **0**, as you can see in the screenshot below. You get the idea.
So, you can use the **up == 0** expressions to list only the targets that are not running or inaccessible to Prometheus, as you can see in the screenshot below.
This expression can be used to create a Prometheus Alert and send alerts to Alert Manager when one or more targets are not running or inaccessible to Prometheus.
To create a Prometheus Alert, create a new file **rules.yml** in the **/opt/prometheus/** directory as follows:
```ad-command
~~~bash
sudo nano /etc/prometheus/rules.yml
~~~
```
 
Now, type in the following lines in the **rules.yml** file.
```ad-code
~~~yaml
groups:
- name: test
rules:
- alert: InstanceDown
expr: up == 0
for: 1m
~~~
```
 
Here, the alert **InstanceDown** will be fired when targets are not running or inaccessible to Prometheus (that is **up == 0**) for a minute (**1m**).
Now, open the Prometheus configuration file **/opt/prometheus/prometheus.yml** with the **nano** text editor as follows:
```ad-command
~~~bash
sudo nano /etc/prometheus/prometheus.yml
~~~
```
 
Add the **rules.yml** file in the **rule_files** section of the prometheus.yml configuration file.
Another important option of the **prometheus.yml** file is **evaluation_interval**. Prometheus will check whether any rules matched every **evaluation_interval** time. The default is 15s (**15** seconds). So, the Alert rules in the **rules.yml** file will be checked every 15 seconds.
For the changes to take effect, restart the **prometheus** service as follows:
```ad-command
~~~bash
sudo systemctl restart prometheus
~~~
```
 
Now, navigate to the URL [http://localhost:9010/rules](http://localhost:9010/rules) from your favorite web browser, and you should see the rule **InstanceDown** that youve just added.
As youve stopped **node_exporter** earlier, the alert is active, and it is waiting to be sent to the Alert Manager.
After a minute has passed, the alert **InstanceDown** should be in the **FIRING** state. It means that the alert is sent to the Alert Manager.
 
---
 
### Configuring monitoring modules
[[#^Top|TOP]]
 
#### Node-Exporter
Pour commencer, télécharger la dernière version de Node Exporter ici: [Node-Exporter](https://prometheus.io/download/#node_exporter)
```ad-command
~~~bash
wget https://github.com/prometheus/node_exporter/releases/download/v1.3.1/node_exporter-1.3.1.linux-amd64.tar.gz
~~~
```
 
##### Dépaquetage
```ad-command
~~~bash
tar -xf node_exporter-1.3.1.linux-amd64.tar.gz
~~~
```
Puis on la déplace dans un répertoire qui lui permet d'être gérer par le système
```ad-command
~~~bash
mv node_exporter-1.3.1.linux-amd64/node_exporter /usr/local/bin/
~~~
```
 
##### Installation & Mise en service
En réalitée, on installe pas vraiment Node Exporter, on crée juste une tache système qui vas lancer la commande.
Et pour ça, on crée un utilisateur node exporter qui va s'occuper du service.
```ad-command
~~~bash
useradd -rs /bin/false node_exporter
~~~
```
 
Ensuite on crée le fameux service.
```ad-command
~~~bash
sudo nano /etc/systemd/system/node_exporter.service
~~~
```
 
Le fichier doit contenir les infos suivante:
```ad-code
~~~bash
[Unit]
Description=Node Exporter
After=network.target
[Service]
User=node_exporter
Group=node_exporter
Type=simple
ExecStart=/usr/local/bin/node_exporter
[Install]
WantedBy=multi-user.target
~~~
```
 
Maintenant il faut recharger le daemon
```ad-command
~~~bash
sudo systemctl daemon-reload
~~~
```
 
Puis démarrer node_exporter
```ad-command
~~~bash
sudo systemctl start node_exporter
~~~
```
 
Il faut vérifier si node_exporter fonctionne
```ad-command
~~~bash
sudo systemctl status node_exporter
~~~
```
 
Si tout vas bien, alors on peut l'ajouter au service au démarrage
```ad-command
~~~bash
sudo systemctl enable node_exporter
~~~
```
 
Pour savoir si tout vas bien:
```ad-command
~~~bash
sudo curl http://localhost:9100/metrics
~~~
```
 
##### Ajouter l'host à Prometheus
Pour ajouter l'host il faut modifier le fichier de configuration de Prometheus
```ad-command
~~~bash
sudo nano /etc/prometheus/prometheus.yml
~~~
```
 
Ajouter un target avec l'adresse ip voulu en dessous du target existant.
```ad-code
~~~yaml
- job_name: 'node_exporter'
scrape_interval: 5s
static_configs:
- targets: ['localhost:9100']
~~~
```
 
##### Redémarrage de Prometheus
Pour que tout soit pris en compte il faut redémarrer le service prometheus:
```ad-command
~~~bash
sudo systemct restart prometheus
~~~
```
 
##### Vérification
Pour voire si tout vas bien, un petit tour sur votre interface prometheus ([http://prometheus-ip:9090/targets](http://prometheus-ip:9090/targets)) ou grafana et voir si votre host apparait bien !
 
---
 
### Configuring rules and alerts
[[#^Top|TOP]]
 
#### Introduction
Rules defining alerts are to be defined in `/etc/prometheus/config.yml` by referencing rule files in the same folder. As a generic process, here is what to do:
1. Define & reference the rule file in Prometheus' config file
`rules.yml`
2. Create the rule file
```ad-command
~~~bash
sudo nano /etc/prometheus/rules.yml
~~~
```
 
3. Add the defined rule
See external resource for examples.
4. Relaunch Prometheus
```ad-command
~~~bash
sudo systemctl restart prometheus
~~~
```
 
Once this is done, Prometheus may not restart, prompting to a problem in the configuration file. Please check whitespacing and other formatting issues before trying to restart the daemon again.
 
#### External ressource
[Awesome Prometheus alerts | Collection of alerting rules](https://awesome-prometheus-alerts.grep.to/rules.html)
 
---
 
### Using Prometheus to monitor Caddy
[[#^Top|TOP]]
 
#### Global parameters
| | |
| --------------------- | -------------------------- |
| **Caddy metrics API** | https://tools.mfxm.fr:7784 |
| **Prometheus web listening port** | 9010 |
 
#### Adding a monitoring job
[[#^Top|TOP]]
Monitoring jobs are called `scrape` Jobs and are defined in the `/etc/prometheus/prometheus.yml` file under the `scrape_configs:` JSON header. Below is an example of job definition.
```ad-code
~~~javascript
scrape_configs:
- job_name: caddy
scheme: https
static_configs:
- targets:
- tools.mfxm.fr:7784
~~~
```
 
---
 
### Using Telegram for notifications
[[#^Top|TOP]]
 
#### Installing the Telegram Bridge
In order to set up the [[Configuring Telegram bots|Telegram bot]], first, pull the image from its github repository:
```ad-command
~~~bash
sudo git clone https://github.com/inCaller/prometheus_bot
~~~
```
 
Move to the created folder:
```ad-command
~~~bash
cd ~/prometheus_bot
~~~
```
 
Compile the programme in Go:
```ad-command
~~~bash
export GOPATH="your go path"
make clean
make
~~~
```
 
Update the config file:
```ad-path
/home/melchiorbv/prometheus_bot/config.yaml
```
 
```ad-code
~~~yaml
telegram_token: "token goes here"
# ONLY IF YOU USING DATA FORMATTING FUNCTION, NOTE for developer: important or test fail
time_outdata: "02/01/2006 15:04:05"
template_path: "/home/melchiorbv/prometheus_bot/template.tmpl" # ONLY IF YOU USING TEMPLATE
time_zone: "Europe/Amsterdam" # ONLY IF YOU USING TEMPLATE
split_msg_byte: 4000
send_only: true # use bot only to send messages.
~~~
```
 
Then, update the template file:
```ad-path
/home/melchiorbv/prometheus_bot/template.tmpl
```
 
```ad-code
~~~yaml
Type: {{.CommonAnnotations.description}}
Summary: {{.CommonAnnotations.summary}}
Alertname: {{ .CommonLabels.alertname }}
Instance: {{ .CommonLabels.instance }}
Serverity: {{ .CommonLabels.serverity}}
Status: {{ .Status }}
~~~
```
 
Run the daemon with:
```ad-command
~~~bash
./prometheus_bot
~~~
```
First part done.
 
#### Linking the bot to Alertmanager
[[#^Top|TOP]]
Edit the `AlertManager` config file under `/opt/alertmanager/alertmanager.yml` and add:
```ad-code
~~~yaml
- name: 'admins'
webhook_configs:
- send_resolved: True
url: http://127.0.0.1:9087/alert/chat_id
~~~
```
Replace `chat_id` with the value you got from your bot, ***with everything inside the quotes***. (Some chat_id's start with a `-`, in this case, you must also include the `-` in the url) To use multiple chats just add more receivers.
Relaunch the AlertManager:
```ad-command
~~~bash
sudo systemctl restart alertmanager.service
~~~
```
 
 
[[#^Top|TOP]]