You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

863 lines
18 KiB

3 years ago
---
Alias: ["Prometheus"]
Tag: ["Computer", "Server", "Monitoring"]
Date: 2022-03-19
DocType: "Personal"
Hierarchy: "NonRoot"
TimeStamp:
location: [47.3639129,8.55627491017841]
CollapseMetaTable: Yes
---
Parent:: [[Selfhosting]], [[Configuring Caddy|caddy]], [[Server Tools]]
---
 
3 years ago
^Top
3 years ago
```button
name Save
type command
action Save current file
id Save
```
^button-ConfiguringPrometheusNSave
 
# Configuring Prometheus
 
```ad-abstract
title: Summary
collapse: open
This not runs through the installation and use of Prometheus as a monitoring tool.
Prometheus interacts better with json logs rather than common log language, which is caddy's output.
```
 
```toc
style: number
```
 
---
 
### Introduction
3 years ago
[[#^Top|TOP]]
3 years ago
 
[Prometheus](https://prometheus.io/docs/introduction/overview/) is a free and open-source monitoring and alerting tool that was initially used for monitoring metrics at SoundCloud back in 2012. It is written in Go programming language.
Prometheus monitors and records real-time events in a time-series database. Since then it has grown in leaps and bounds and had been adopted by many organizations to monitor their infrastructure metrics. Prometheus provides flexible queries and real-time alerting which helps in quick diagnosis and troubleshooting of errors.
Prometheus comprises the following major components:
- The main Prometheus server for scraping and storing time-series data.
- Unique exporters for services such as Graphite, HAProxy, StatsD and so much more
- An alert manager for handling alerts
- A push-gateway for supporting transient jobs
- Client libraries for instrumenting application code
 
---
 
### Installing Prometheus
3 years ago
[[#^Top|TOP]]
3 years ago
 
#### Installing the main modules
But first, we need to create the configuration and data directories for Prometheus.
To create the configuration directory, run the command:
```ad-command
~~~bash
sudo mkdir -p /etc/prometheus
~~~
```
 
For the data directory, execute:
```ad-command
~~~bash
sudo mkdir -p /var/lib/prometheus
~~~
```
 
Once the directories are created, grab the compressed installation file:
```ad-command
~~~bash
wget https://github.com/prometheus/prometheus/releases/download/v2.31.0/prometheus-2.31.0.linux-amd64.tar.gz
~~~
```
 
Once downloaded, extract the tarball file.
```ad-command
~~~bash
tar -xvf prometheus-2.31.3.linux-amd64.tar.gz
~~~
```
 
Then navigate to the Prometheus folder.
```ad-command
~~~bash
cd prometheus-2.31.3.linux-amd64
~~~
```
 
Once in the [directory move](https://linoxide.com/mv-command-in-linux/) the  `prometheus` and `promtool` binary files to `/usr/local/bin/` folder.
```ad-command
~~~bash
sudo mv prometheus promtool /usr/local/bin/
~~~
```
 
Additionally, move console files in `console` directory and library files in the `console_libraries`  directory to `/etc/prometheus/` directory.
```ad-command
~~~bash
sudo mv consoles/ console_libraries/ /etc/prometheus/
~~~
```
 
Also, ensure to move the prometheus.yml template configuration file to the  **`/etc/prometheus/`** directory.
```ad-command
~~~bash
sudo mv prometheus.yml /etc/prometheus/prometheus.yml
~~~
```
 
At this point, Prometheus has been successfully installed. To check the version of Prometheus installed, run the command:
```ad-command
~~~bash
prometheus --version
~~~
```
 
Output:
```ad-code
~~~bash
prometheus, version 2.31.3 (branch: HEAD, revision: f29caccc42557f6a8ec30ea9b3c8c089391bd5df)
build user: root@5cff4265f0e3
build date: 20211005-16:10:52
go version: go1.17.1
platform: linux/amd64
~~~
```
 
```ad-command
~~~bash
promtool --version
~~~
```
 
Output:
```ad-code
~~~bash
promtool, version 2.31.3 (branch: HEAD, revision: f29caccc42557f6a8ec30ea9b3c8c089391bd5df)
build user: root@5cff4265f0e3
build date: 20211005-16:10:52
go version: go1.17.1
platform: linux/amd64
~~~
```
If your output resembles what I have, then you are on the right track. In the next step, we will create a system group and user.
 
#### Permissions & User Management
3 years ago
[[#^Top|TOP]]
3 years ago
It's essential that we create a Prometheus group and user before proceeding to the next step which involves creating a system file for Prometheus.
To  create a `prometheus` [group](https://linoxide.com/groupadd-command/) execute the command:
```ad-command
~~~bash
sudo groupadd --system prometheus
~~~
```
 
Thereafter, Create `prometheus` user and assign it to the just-created `prometheus` group.
```ad-command
~~~bash
sudo useradd -s /sbin/nologin --system -g prometheus prometheus
~~~
```
 
Next, configure the directory ownership and permissions as follows.
```ad-command
~~~bash
sudo chown -R prometheus:prometheus /etc/prometheus/ /var/lib/prometheus/$ sudo chmod -R 775 /etc/prometheus/ /var/lib/prometheus/
~~~
```
The only part remaining is to make Prometheus a systemd service so that we can easily manage its running status.
 
#### Configuring the service
3 years ago
[[#^Top|TOP]]
3 years ago
Using your favorite text editor, create a systemd service file:
```ad-command
~~~bash
sudo nano /etc/systemd/system/prometheus.service
~~~
```
 
Paste the following lines of code.
```ad-code
~~~bash
[Unit]
Description=Prometheus
Wants=network-online.target
After=network-online.target
[Service]
User=prometheus
Group=prometheus
Restart=always
Type=simple
ExecStart=/usr/local/bin/prometheus \
--config.file=/etc/prometheus/prometheus.yml \
--storage.tsdb.path=/var/lib/prometheus/ \
--web.console.templates=/etc/prometheus/consoles \
--web.console.libraries=/etc/prometheus/console_libraries \
--web.listen-address=0.0.0.0:9090
[Install]
WantedBy=multi-user.target
~~~
```
Save the changes and exit the systemd file.
Then proceed and start the Prometheus service.
```ad-command
~~~bash
sudo systemctl start prometheus
~~~
```
 
Enable the Prometheus service to run at startup. Therefore invoke the command:
```ad-command
~~~bash
sudo systemctl enable prometheus
~~~
```
 
Then confirm the status of the Prometheus service.
```ad-command
~~~bash
sudo systemctl status prometheus
~~~
```
![Check status of Prometheus services](https://linoxide.com/wp-content/uploads/2021/11/2021-10-1003-Check-status-of-Prometheus-services.png)![Check status of Prometheus services](https://linoxide.com/wp-content/uploads/2021/11/2021-10-1003-Check-status-of-Prometheus-services.png)
 
#### Configuration of user acccess
3 years ago
[[#^Top|TOP]]
3 years ago
Finally, to access Prometheus, parameter your reverse-proxy ([[Configuring Caddy|caddy]]) to point back to the service.
It is accessible below, under internal port 9090:
```ad-address
https://prometheus.mfxm.fr
```
 
![prometheus dashboard](https://linoxide.com/wp-content/uploads/2021/11/2021-10-1003-Prometheus-dashboard-1024x440.png)![prometheus dashboard](https://linoxide.com/wp-content/uploads/2021/11/2021-10-1003-Prometheus-dashboard-1024x440.png)
 
3 years ago
---
 
### Configuring alerts
[[#^Top|TOP]]
 
#### Install Alertmanager
Download the latest version of Alert Manager (v0.23.0 at the time of this writing) with the following command:
```ad-command
~~~bash
wget https://github.com/prometheus/alertmanager/releases/download/v0.23.0/alertmanager-0.23.0.linux-amd64.tar.gz
~~~
```
 
Alert Manager is being downloaded. It may take a while to complete.
At this point, Alert Manager should be downloaded.
Once Alert Manager is downloaded, you should find a new archive file **alertmanager-0.23.0.linux-amd64.tar.gz** in your current working directory.
Extract the **alertmanager-0.23.0.linux-amd64.tar.gz** archive with the following command:
```ad-command
~~~bash
tar xzf alertmanager-0.22.2.linux-amd64.tar.gz
~~~
```
 
You should find a new directory **alertmanager-0.23.0.linux-amd64/** as marked in the screenshot below.
Now, move the **alertmanager-0.23.0.linux-amd64** directory to **/opt/** directory and rename it to **alertmanager** as follows:
```ad-command
~~~bash
sudo mv -v alertmanager-0.23.0.linux-amd64 /opt/alertmanager
~~~
```
 
Change the user and group of all the files and directories of the `/opt/alertmanager/` directory to root as follows:
```ad-command
~~~bash
sudo chown -Rfv root:root /opt/alertmanager
~~~
```
 
In the **/opt/alertmanager** directory, you should find the **alertmanager** binary and the Alert Manager configuration file **alertmanager.yml**. You will use them later. So, just keep that in mind.
 
#### Creating a Data Directory
[[#^Top|TOP]]
Alert Manager needs a directory where it can store its data. As you will be running Alert Manager as the **prometheus** system user, the **prometheus** system user must have access (read, write, and execute permissions) to that data directory.
You can create the **data/** directory in the **/opt/alertmanager/** directory as follows:
```ad-command
~~~bash
sudo mkdir -v /opt/alertmanager/data
~~~
```
 
Change the owner and group of the **/opt/alertmanager/data/** directory to **prometheus** with the following command:
```ad-command
~~~bash
sudo chown -Rfv prometheus:prometheus /opt/alertmanager/data
~~~
```
 
The owner and group of the **/opt/alertmanager/data/** directory should be changed to **prometheus**.
 
#### Starting Alert Manager on Boot
[[#^Top|TOP]]
Now, you have to create a systemd service file for Alert Manager so that you can easily manage (start, stop, restart, and add to startup) the alertmanager service with systemd.
To create a systemd service file **alertmanager.service**, run the following command:
```ad-command
~~~bash
sudo nano /etc/systemd/system/alertmanager.service
~~~
```
 
Type in the following lines in the **alertmanager.service** file.
```ad-code
~~~bash
[Unit]
Description=Alertmanager for prometheus
[Service]
Restart=always
User=prometheus
ExecStart=/opt/alertmanager/alertmanager --config.file=/opt/alertmanager/alertmanager.yml --storage.path=/opt/alertmanager/data            
ExecReload=/bin/kill -HUP $MAINPID
TimeoutStopSec=20s
SendSIGKILL=no
[Install]
WantedBy=multi-user.target
~~~
```
 
For the systemd changes to take effect, run the following command:
```ad-command
~~~bash
sudo systemctl daemon-reload
~~~
```
 
Now, start the **alertmanager** service with the following command:
```ad-command
~~~bash
sudo systemctl start alertmanager.service
~~~
```
 
Add the **alertmanager** service to the system startup so that it automatically starts on boot with the following command:
```ad-command
~~~bash
sudo systemctl enable alertmanager.service
~~~
```
 
As you can see, the **alertmanager** service is **active/running**. It is also **enabled** (it will start automatically on boot).
```ad-command
~~~bash
sudo systemctl status alertmanager.service
~~~
```
 
#### Configuring Prometheus
[[#^Top|TOP]]
Now, you have to configure Prometheus to use Alert Manager. You can also monitor Alert Manager with Prometheus. I will show you how to do both in this section.
First, find the IP address of the computer where you have installed Alert Manager with the following command:
```ad-command
~~~bash
hostname -I
~~~
```
 
Now, open the Prometheus configuration file **/opt/prometheus/prometheus.yml** with the **nano** text editor as follows:
```ad-command
~~~bash
sudo nano /etc/prometheus/prometheus.yml
~~~
```
 
Type in the following lines in the **scrape_configs** section to add Alert Manager for monitoring with Prometheus.
```ad-code
~~~bash
- job_name: 'alertmanager'
  static_configs:
  - targets: ['localhost:9093']
~~~
```
 
Also, type in the IP address and port number of Alert Manager in the **alerting > alertmanagers** section.
For the changes to take effect, restart the **prometheus** service as follows:
```ad-command
~~~bash
sudo systemctl restart prometheus
~~~
```
 
Visit the URL [http://192.168.20.161:9090/targets](http://192.168.20.161:9090/targets) from your favorite web browser, and you should see that **alertmanager** is in the **UP** state. So, Prometheus can access Alert Manager just fine.
 
#### Creating a Prometheus Alert Rule
[[#^Top|TOP]]
On Prometheus, you can use the **up** expression to find the state of the targets added to Prometheus, as shown in the screenshot below.
The targets that are in the **UP** state (running and accessible to Prometheus) will have the value **1**, and targets that are not in the **UP** (or **DOWN**) state (not running or inaccessible to Prometheus) will have the value **0**.
If you stop one of the targets **node_exporter** (lets say).
```ad-command
~~~bash
sudo systemctl stop node-exporter.service
~~~
```
 
The **up** value of that target should be **0**, as you can see in the screenshot below. You get the idea.
So, you can use the **up == 0** expressions to list only the targets that are not running or inaccessible to Prometheus, as you can see in the screenshot below.
This expression can be used to create a Prometheus Alert and send alerts to Alert Manager when one or more targets are not running or inaccessible to Prometheus.
To create a Prometheus Alert, create a new file **rules.yml** in the **/opt/prometheus/** directory as follows:
```ad-command
~~~bash
sudo nano /etc/prometheus/rules.yml
~~~
```
 
Now, type in the following lines in the **rules.yml** file.
```ad-code
~~~yaml
groups:
- name: test
rules:
- alert: InstanceDown
expr: up == 0
for: 1m
~~~
```
 
Here, the alert **InstanceDown** will be fired when targets are not running or inaccessible to Prometheus (that is **up == 0**) for a minute (**1m**).
Now, open the Prometheus configuration file **/opt/prometheus/prometheus.yml** with the **nano** text editor as follows:
```ad-command
~~~bash
sudo nano /etc/prometheus/prometheus.yml
~~~
```
 
Add the **rules.yml** file in the **rule_files** section of the prometheus.yml configuration file.
Another important option of the **prometheus.yml** file is **evaluation_interval**. Prometheus will check whether any rules matched every **evaluation_interval** time. The default is 15s (**15** seconds). So, the Alert rules in the **rules.yml** file will be checked every 15 seconds.
For the changes to take effect, restart the **prometheus** service as follows:
```ad-command
~~~bash
sudo systemctl restart prometheus
~~~
```
 
Now, navigate to the URL [http://localhost:9010/rules](http://localhost:9010/rules) from your favorite web browser, and you should see the rule **InstanceDown** that youve just added.
As youve stopped **node_exporter** earlier, the alert is active, and it is waiting to be sent to the Alert Manager.
After a minute has passed, the alert **InstanceDown** should be in the **FIRING** state. It means that the alert is sent to the Alert Manager.
 
---
 
### Configuring monitoring modules
[[#^Top|TOP]]
 
 
---
 
### Configuring rules and alerts
[[#^Top|TOP]]
 
#### Introduction
Rules defining alerts are to be defined in `/etc/prometheus/config.yml` by referencing rule files in the same folder. As a generic process, here is what to do:
1. Define & reference the rule file in Prometheus' config file
`rules.yml`
2. Create the rule file
```ad-command
~~~bash
sudo nano /etc/prometheus/rules.yml
~~~
```
 
3. Add the defined rule
See external resource for examples.
4. Relaunch Prometheus
```ad-command
~~~bash
sudo systemctl restart prometheus
~~~
```
 
Once this is done, Prometheus may not restart, prompting to a problem in the configuration file. Please check whitespacing and other formatting issues before trying to restart the daemon again.
 
#### External ressource
[Awesome Prometheus alerts | Collection of alerting rules](https://awesome-prometheus-alerts.grep.to/rules.html)
 
---
 
### Using Prometheus to monitor Caddy
[[#^Top|TOP]]
 
#### Global parameters
| | |
| --------------------- | -------------------------- |
| **Caddy metrics API** | https://tools.mfxm.fr:7784 |
| **Prometheus web listening port** | 9010 |
 
#### Adding a monitoring job
[[#^Top|TOP]]
Monitoring jobs are called `scrape` Jobs and are defined in the `/etc/prometheus/prometheus.yml` file under the `scrape_configs:` JSON header. Below is an example of job definition.
```ad-code
~~~javascript
scrape_configs:
- job_name: caddy
scheme: https
static_configs:
- targets:
- tools.mfxm.fr:7784
~~~
```
 
---
 
### Using Telegram for notifications
[[#^Top|TOP]]
 
#### Installing the Telegram Bridge
In order to set up the [[Configuring Telegram bots|Telegram bot]], first, pull the image from its github repository:
```ad-command
~~~bash
sudo git clone https://github.com/inCaller/prometheus_bot
~~~
```
 
Move to the created folder:
```ad-command
~~~bash
cd ~/prometheus_bot
~~~
```
 
Compile the programme in Go:
```ad-command
~~~bash
export GOPATH="your go path"
make clean
make
~~~
```
 
Update the config file:
```ad-path
/home/melchiorbv/prometheus_bot/config.yaml
```
 
```ad-code
~~~yaml
telegram_token: "token goes here"
# ONLY IF YOU USING DATA FORMATTING FUNCTION, NOTE for developer: important or test fail
time_outdata: "02/01/2006 15:04:05"
template_path: "/home/melchiorbv/prometheus_bot/template.tmpl" # ONLY IF YOU USING TEMPLATE
time_zone: "Europe/Amsterdam" # ONLY IF YOU USING TEMPLATE
split_msg_byte: 4000
send_only: true # use bot only to send messages.
~~~
```
 
Then, update the template file:
```ad-path
/home/melchiorbv/prometheus_bot/template.tmpl
```
 
```ad-code
~~~yaml
Type: {{.CommonAnnotations.description}}
Summary: {{.CommonAnnotations.summary}}
Alertname: {{ .CommonLabels.alertname }}
Instance: {{ .CommonLabels.instance }}
Serverity: {{ .CommonLabels.serverity}}
Status: {{ .Status }}
~~~
```
 
Run the daemon with:
```ad-command
~~~bash
./prometheus_bot
~~~
```
First part done.
 
#### Linking the bot to Alertmanager
[[#^Top|TOP]]
Edit the `AlertManager` config file under `/opt/alertmanager/alertmanager.yml` and add:
```ad-code
~~~yaml
- name: 'admins'
webhook_configs:
- send_resolved: True
url: http://127.0.0.1:9087/alert/chat_id
~~~
```
Replace `chat_id` with the value you got from your bot, ***with everything inside the quotes***. (Some chat_id's start with a `-`, in this case, you must also include the `-` in the url) To use multiple chats just add more receivers.
Relaunch the AlertManager:
```ad-command
~~~bash
sudo systemctl restart alertmanager.service
~~~
```
 
 
[[#^Top|TOP]]