Intelligent systems with prometheus

Intelligent systems with prometheus

Cloud native systems are complex systems. Hardware fails, software has bugs, things get wrong.

In this scenario Prometheus offer a valid help for creating more resilient systems, whichg can auto-heal themself in certains circustances.

In this demo I will show how you can use the Prometheus stack to improve your systems.

What do you need for getting the pattern

I do simplify lot of things assuming you can setup prometehus/alertmanager and you can create an alert via prometehus ( plenty of doc upstream are avail)

1) Install prometheus. In my setup I use openSUSE Linux but Prometheus binary and alertmanager are avail everywhere.

2) After you setup alertmanager and prometheus up and running, configure the alertmanager.

/etc/prometheus/alertmanager.yml

# See https://prometheus.io/docs/alerting/configuration/ for documentation.

global:
receivers:
      - name: my-receiver
        webhook_configs:
          - url: "http://10.162.30.75:9000/hooks/"

route:
  group_by: ["alertname"]
  group_wait: 10s
  group_interval: 10s
  repeat_interval: 10s
  receiver: my-receiver
  routes:
    -  receiver: my-receiver

This configuration say that whatever there is an alert we send a webhook .

After this restart alertmanager service.

3) Create the alerthook-handler. I used golang because I find it provides the right minimalism and efficency for my sake.


package main

import (
        "io"
        "net/http"

        log "github.com/sirupsen/logrus"
)

func main() {
        log.Infoln("starting handler")
        handler1 := func(w http.ResponseWriter, _ *http.Request) {
                io.WriteString(w, "Hello from a HandleFunc handler called via Prometheus alert\n")
                log.Infoln("handler called via prometheus alert")
        }

        http.HandleFunc("/hooks", handler1)
        err := http.ListenAndServe(":9999", nil)
        if err != nil {
                log.Fatalln(err)
        }
}

Build this code and deploy to the IP node

4) So assuming prometehus is up and running, and you have a fake alert for testing.

hana01:/tmp # ./my-handler 
INFO[0000] starting handler                             
INFO[0016] handler called via prometheus alert          
INFO[0019] handler called via prometheus alert          
INFO[0019] handler called via prometheus alert  

Here we do have 3 alerts fired and handled by handler

Now substitute the print function with your domain expertise! Have fun!