Obsidian/05.02 Networks/Configuring Prometheus.md

1029 lines
21 KiB

This file contains invisible Unicode characters!

This file contains invisible Unicode characters that may be processed differently from what appears below. If your use case is intentional and legitimate, you can safely ignore this warning. Use the Escape button to reveal hidden characters.

This file contains ambiguous Unicode characters that may be confused with others in your current locale. If your use case is intentional and legitimate, you can safely ignore this warning. Use the Escape button to highlight these characters.

Alias: ["Prometheus"]
Tag: ["Computer", "Server", "Monitoring"]
Date: 2022-03-19
DocType: "Personal"
Hierarchy: "NonRoot"
location: [47.3639129,8.55627491017841]
CollapseMetaTable: Yes
Parent:: [[Selfhosting]], [[Configuring Caddy|caddy]], [[Server Tools]]
name Save
type command
action Save current file
id Save
# Configuring Prometheus
title: Summary
collapse: open
This not runs through the installation and use of Prometheus as a monitoring tool.
Prometheus interacts better with json logs rather than common log language, which is caddy's output.
style: number
### Introduction
[Prometheus](https://prometheus.io/docs/introduction/overview/) is a free and open-source monitoring and alerting tool that was initially used for monitoring metrics at SoundCloud back in 2012. It is written in Go programming language.
Prometheus monitors and records real-time events in a time-series database. Since then it has grown in leaps and bounds and had been adopted by many organizations to monitor their infrastructure metrics. Prometheus provides flexible queries and real-time alerting which helps in quick diagnosis and troubleshooting of errors.
Prometheus comprises the following major components:
- The main Prometheus server for scraping and storing time-series data.
- Unique exporters for services such as Graphite, HAProxy, StatsD and so much more
- An alert manager for handling alerts
- A push-gateway for supporting transient jobs
- Client libraries for instrumenting application code
### Installing Prometheus
#### Installing the main modules
But first, we need to create the configuration and data directories for Prometheus.
To create the configuration directory, run the command:
sudo mkdir -p /etc/prometheus
For the data directory, execute:
sudo mkdir -p /var/lib/prometheus
Once the directories are created, grab the compressed installation file:
wget https://github.com/prometheus/prometheus/releases/download/v2.31.0/prometheus-2.31.0.linux-amd64.tar.gz
Once downloaded, extract the tarball file.
tar -xvf prometheus-2.31.3.linux-amd64.tar.gz
Then navigate to the Prometheus folder.
cd prometheus-2.31.3.linux-amd64
Once in the [directory move](https://linoxide.com/mv-command-in-linux/) the  `prometheus` and `promtool` binary files to `/usr/local/bin/` folder.
sudo mv prometheus promtool /usr/local/bin/
Additionally, move console files in `console` directory and library files in the `console_libraries`  directory to `/etc/prometheus/` directory.
sudo mv consoles/ console_libraries/ /etc/prometheus/
Also, ensure to move the prometheus.yml template configuration file to the  **`/etc/prometheus/`** directory.
sudo mv prometheus.yml /etc/prometheus/prometheus.yml
At this point, Prometheus has been successfully installed. To check the version of Prometheus installed, run the command:
prometheus --version
prometheus, version 2.31.3 (branch: HEAD, revision: f29caccc42557f6a8ec30ea9b3c8c089391bd5df)
build user: root@5cff4265f0e3
build date: 20211005-16:10:52
go version: go1.17.1
platform: linux/amd64
promtool --version
promtool, version 2.31.3 (branch: HEAD, revision: f29caccc42557f6a8ec30ea9b3c8c089391bd5df)
build user: root@5cff4265f0e3
build date: 20211005-16:10:52
go version: go1.17.1
platform: linux/amd64
If your output resembles what I have, then you are on the right track. In the next step, we will create a system group and user.
#### Permissions & User Management
It's essential that we create a Prometheus group and user before proceeding to the next step which involves creating a system file for Prometheus.
To  create a `prometheus` [group](https://linoxide.com/groupadd-command/) execute the command:
sudo groupadd --system prometheus
Thereafter, Create `prometheus` user and assign it to the just-created `prometheus` group.
sudo useradd -s /sbin/nologin --system -g prometheus prometheus
Next, configure the directory ownership and permissions as follows.
sudo chown -R prometheus:prometheus /etc/prometheus/ /var/lib/prometheus/$ sudo chmod -R 775 /etc/prometheus/ /var/lib/prometheus/
The only part remaining is to make Prometheus a systemd service so that we can easily manage its running status.
#### Configuring the service
Using your favorite text editor, create a systemd service file:
sudo nano /etc/systemd/system/prometheus.service
Paste the following lines of code.
ExecStart=/usr/local/bin/prometheus \
--config.file=/etc/prometheus/prometheus.yml \
--storage.tsdb.path=/var/lib/prometheus/ \
--web.console.templates=/etc/prometheus/consoles \
--web.console.libraries=/etc/prometheus/console_libraries \
Save the changes and exit the systemd file.
Then proceed and start the Prometheus service.
sudo systemctl start prometheus
Enable the Prometheus service to run at startup. Therefore invoke the command:
sudo systemctl enable prometheus
Then confirm the status of the Prometheus service.
sudo systemctl status prometheus
![Check status of Prometheus services](https://linoxide.com/wp-content/uploads/2021/11/2021-10-1003-Check-status-of-Prometheus-services.png)![Check status of Prometheus services](https://linoxide.com/wp-content/uploads/2021/11/2021-10-1003-Check-status-of-Prometheus-services.png)
#### Configuration of user acccess
Finally, to access Prometheus, parameter your reverse-proxy ([[Configuring Caddy|caddy]]) to point back to the service.
It is accessible below, under internal port 9090:
![prometheus dashboard](https://linoxide.com/wp-content/uploads/2021/11/2021-10-1003-Prometheus-dashboard-1024x440.png)![prometheus dashboard](https://linoxide.com/wp-content/uploads/2021/11/2021-10-1003-Prometheus-dashboard-1024x440.png)
### Configuring alerts
#### Install Alertmanager
Download the latest version of Alert Manager (v0.23.0 at the time of this writing) with the following command:
wget https://github.com/prometheus/alertmanager/releases/download/v0.23.0/alertmanager-0.23.0.linux-amd64.tar.gz
Alert Manager is being downloaded. It may take a while to complete.
At this point, Alert Manager should be downloaded.
Once Alert Manager is downloaded, you should find a new archive file **alertmanager-0.23.0.linux-amd64.tar.gz** in your current working directory.
Extract the **alertmanager-0.23.0.linux-amd64.tar.gz** archive with the following command:
tar xzf alertmanager-0.22.2.linux-amd64.tar.gz
You should find a new directory **alertmanager-0.23.0.linux-amd64/** as marked in the screenshot below.
Now, move the **alertmanager-0.23.0.linux-amd64** directory to **/opt/** directory and rename it to **alertmanager** as follows:
sudo mv -v alertmanager-0.23.0.linux-amd64 /opt/alertmanager
Change the user and group of all the files and directories of the `/opt/alertmanager/` directory to root as follows:
sudo chown -Rfv root:root /opt/alertmanager
In the **/opt/alertmanager** directory, you should find the **alertmanager** binary and the Alert Manager configuration file **alertmanager.yml**. You will use them later. So, just keep that in mind.
#### Creating a Data Directory
Alert Manager needs a directory where it can store its data. As you will be running Alert Manager as the **prometheus** system user, the **prometheus** system user must have access (read, write, and execute permissions) to that data directory.
You can create the **data/** directory in the **/opt/alertmanager/** directory as follows:
sudo mkdir -v /opt/alertmanager/data
Change the owner and group of the **/opt/alertmanager/data/** directory to **prometheus** with the following command:
sudo chown -Rfv prometheus:prometheus /opt/alertmanager/data
The owner and group of the **/opt/alertmanager/data/** directory should be changed to **prometheus**.
#### Starting Alert Manager on Boot
Now, you have to create a systemd service file for Alert Manager so that you can easily manage (start, stop, restart, and add to startup) the alertmanager service with systemd.
To create a systemd service file **alertmanager.service**, run the following command:
sudo nano /etc/systemd/system/alertmanager.service
Type in the following lines in the **alertmanager.service** file.
Description=Alertmanager for prometheus
ExecStart=/opt/alertmanager/alertmanager --config.file=/opt/alertmanager/alertmanager.yml --storage.path=/opt/alertmanager/data            
ExecReload=/bin/kill -HUP $MAINPID
For the systemd changes to take effect, run the following command:
sudo systemctl daemon-reload
Now, start the **alertmanager** service with the following command:
sudo systemctl start alertmanager.service
Add the **alertmanager** service to the system startup so that it automatically starts on boot with the following command:
sudo systemctl enable alertmanager.service
As you can see, the **alertmanager** service is **active/running**. It is also **enabled** (it will start automatically on boot).
sudo systemctl status alertmanager.service
#### Configuring Prometheus
Now, you have to configure Prometheus to use Alert Manager. You can also monitor Alert Manager with Prometheus. I will show you how to do both in this section.
First, find the IP address of the computer where you have installed Alert Manager with the following command:
hostname -I
Now, open the Prometheus configuration file **/opt/prometheus/prometheus.yml** with the **nano** text editor as follows:
sudo nano /etc/prometheus/prometheus.yml
Type in the following lines in the **scrape_configs** section to add Alert Manager for monitoring with Prometheus.
- job_name: 'alertmanager'
  - targets: ['localhost:9093']
Also, type in the IP address and port number of Alert Manager in the **alerting > alertmanagers** section.
For the changes to take effect, restart the **prometheus** service as follows:
sudo systemctl restart prometheus
Visit the URL []( from your favorite web browser, and you should see that **alertmanager** is in the **UP** state. So, Prometheus can access Alert Manager just fine.
#### Creating a Prometheus Alert Rule
On Prometheus, you can use the **up** expression to find the state of the targets added to Prometheus, as shown in the screenshot below.
The targets that are in the **UP** state (running and accessible to Prometheus) will have the value **1**, and targets that are not in the **UP** (or **DOWN**) state (not running or inaccessible to Prometheus) will have the value **0**.
If you stop one of the targets **node_exporter** (lets say).
sudo systemctl stop node-exporter.service
The **up** value of that target should be **0**, as you can see in the screenshot below. You get the idea.
So, you can use the **up == 0** expressions to list only the targets that are not running or inaccessible to Prometheus, as you can see in the screenshot below.
This expression can be used to create a Prometheus Alert and send alerts to Alert Manager when one or more targets are not running or inaccessible to Prometheus.
To create a Prometheus Alert, create a new file **rules.yml** in the **/opt/prometheus/** directory as follows:
sudo nano /etc/prometheus/rules.yml
Now, type in the following lines in the **rules.yml** file.
- name: test
- alert: InstanceDown
expr: up == 0
for: 1m
Here, the alert **InstanceDown** will be fired when targets are not running or inaccessible to Prometheus (that is **up == 0**) for a minute (**1m**).
Now, open the Prometheus configuration file **/opt/prometheus/prometheus.yml** with the **nano** text editor as follows:
sudo nano /etc/prometheus/prometheus.yml
Add the **rules.yml** file in the **rule_files** section of the prometheus.yml configuration file.
Another important option of the **prometheus.yml** file is **evaluation_interval**. Prometheus will check whether any rules matched every **evaluation_interval** time. The default is 15s (**15** seconds). So, the Alert rules in the **rules.yml** file will be checked every 15 seconds.
For the changes to take effect, restart the **prometheus** service as follows:
sudo systemctl restart prometheus
Now, navigate to the URL [http://localhost:9010/rules](http://localhost:9010/rules) from your favorite web browser, and you should see the rule **InstanceDown** that youve just added.
As youve stopped **node_exporter** earlier, the alert is active, and it is waiting to be sent to the Alert Manager.
After a minute has passed, the alert **InstanceDown** should be in the **FIRING** state. It means that the alert is sent to the Alert Manager.
### Configuring monitoring modules
#### Node-Exporter
Pour commencer, télécharger la dernière version de Node Exporter ici: [Node-Exporter](https://prometheus.io/download/#node_exporter)
wget https://github.com/prometheus/node_exporter/releases/download/v1.3.1/node_exporter-1.3.1.linux-amd64.tar.gz
##### Dépaquetage
tar -xf node_exporter-1.3.1.linux-amd64.tar.gz
Puis on la déplace dans un répertoire qui lui permet d'être gérer par le système
mv node_exporter-1.3.1.linux-amd64/node_exporter /usr/local/bin/
##### Installation & Mise en service
En réalitée, on installe pas vraiment Node Exporter, on crée juste une tache système qui vas lancer la commande.
Et pour ça, on crée un utilisateur node exporter qui va s'occuper du service.
useradd -rs /bin/false node_exporter
Ensuite on crée le fameux service.
sudo nano /etc/systemd/system/node_exporter.service
Le fichier doit contenir les infos suivante:
Description=Node Exporter
Maintenant il faut recharger le daemon
sudo systemctl daemon-reload
Puis démarrer node_exporter
sudo systemctl start node_exporter
Il faut vérifier si node_exporter fonctionne
sudo systemctl status node_exporter
Si tout vas bien, alors on peut l'ajouter au service au démarrage
sudo systemctl enable node_exporter
Pour savoir si tout vas bien:
sudo curl http://localhost:9100/metrics
##### Ajouter l'host à Prometheus
Pour ajouter l'host il faut modifier le fichier de configuration de Prometheus
sudo nano /etc/prometheus/prometheus.yml
Ajouter un target avec l'adresse ip voulu en dessous du target existant.
- job_name: 'node_exporter'
scrape_interval: 5s
- targets: ['localhost:9100']
##### Redémarrage de Prometheus
Pour que tout soit pris en compte il faut redémarrer le service prometheus:
sudo systemct restart prometheus
##### Vérification
Pour voire si tout vas bien, un petit tour sur votre interface prometheus ([http://prometheus-ip:9090/targets](http://prometheus-ip:9090/targets)) ou grafana et voir si votre host apparait bien !
### Configuring rules and alerts
#### Introduction
Rules defining alerts are to be defined in `/etc/prometheus/config.yml` by referencing rule files in the same folder. As a generic process, here is what to do:
1. Define & reference the rule file in Prometheus' config file
2. Create the rule file
sudo nano /etc/prometheus/rules.yml
3. Add the defined rule
See external resource for examples.
4. Relaunch Prometheus
sudo systemctl restart prometheus
Once this is done, Prometheus may not restart, prompting to a problem in the configuration file. Please check whitespacing and other formatting issues before trying to restart the daemon again.
#### External ressource
[Awesome Prometheus alerts | Collection of alerting rules](https://awesome-prometheus-alerts.grep.to/rules.html)
### Using Prometheus to monitor Caddy
#### Global parameters
| | |
| --------------------- | -------------------------- |
| **Caddy metrics API** | https://tools.mfxm.fr:7784 |
| **Prometheus web listening port** | 9010 |
#### Adding a monitoring job
Monitoring jobs are called `scrape` Jobs and are defined in the `/etc/prometheus/prometheus.yml` file under the `scrape_configs:` JSON header. Below is an example of job definition.
- job_name: caddy
scheme: https
- targets:
- tools.mfxm.fr:7784
### Using Telegram for notifications
#### Installing the Telegram Bridge
In order to set up the [[Configuring Telegram bots|Telegram bot]], first, pull the image from its github repository:
sudo git clone https://github.com/inCaller/prometheus_bot
Move to the created folder:
cd ~/prometheus_bot
Compile the programme in Go:
export GOPATH="your go path"
make clean
Update the config file:
telegram_token: "token goes here"
# ONLY IF YOU USING DATA FORMATTING FUNCTION, NOTE for developer: important or test fail
time_outdata: "02/01/2006 15:04:05"
template_path: "/home/melchiorbv/prometheus_bot/template.tmpl" # ONLY IF YOU USING TEMPLATE
time_zone: "Europe/Amsterdam" # ONLY IF YOU USING TEMPLATE
split_msg_byte: 4000
send_only: true # use bot only to send messages.
Then, update the template file:
Type: {{.CommonAnnotations.description}}
Summary: {{.CommonAnnotations.summary}}
Alertname: {{ .CommonLabels.alertname }}
Instance: {{ .CommonLabels.instance }}
Serverity: {{ .CommonLabels.serverity}}
Status: {{ .Status }}
Run the daemon with:
First part done.
#### Linking the bot to Alertmanager
Edit the `AlertManager` config file under `/opt/alertmanager/alertmanager.yml` and add:
- name: 'admins'
- send_resolved: True
Replace `chat_id` with the value you got from your bot, ***with everything inside the quotes***. (Some chat_id's start with a `-`, in this case, you must also include the `-` in the url) To use multiple chats just add more receivers.
Relaunch the AlertManager:
sudo systemctl restart alertmanager.service