Diagram of product resources

Monitoring stack for Kubernetes

Monitoring stack (Prometheus/Loki/Grafana/Tempo) for Kubernetes deployed from Helm charts
$1,550
Dependencies included: $350
BUY
28

Module installs the monitoring stack where components can be disabled or customized.
The default setup deploys a complete set with a minimal configuration - Grafana is exposed via Ingress, Loki and Prometheus are available only inside the cluster, Tempo disabled, persistent storage for all the components configured, main exporters are enabled.
Tempo works in distributed (microservice) mode.

Tempo core components are: compactor, distributor, ingester, querier, query-frontend, memcached.
Grafana and Prometheus use Recreate update strategy type that causes short downtime between deleting old pod and creating a new one to properly re-attach volumes.
All main components expect Nginx as Ingress class as dependency for this module.
If Prometheus or Loki are marked as enabled, corresponding local datasources for Grafana will be created.
Pushgateway installed within Prometheus helm chart and disabled by default. Use pushgateway values from variables table to install and configure it. The Prometheus Pushgateway exists to allow ephemeral and batch jobs to expose their metrics to Prometheus.
Nginx is used as ingress_class in Ingress Annotations by default for all main monitoring-stack components. If custom ingress_class and ingress_auth_enabled are used, specific auth Annotations must be provided through loki.custom_values and prometheus.custom_values.

NOTE: Enable loki.bind_memberlist_endpoint if you face the following issue during deployment:

 bashfailed: failed to create memberlist: Failed to get final advertise address: no private IP address found, and explicit IP not provided
Log in to Corewide IaC registry

Once you have a Corewide Solutions Portal account, this one-time action will use your browser session to retrieve credentials:

 shellterraform login solutions.corewide.com
Provision instructions

Initialize mandatory providers:

Copy and paste into your Terraform configuration and insert the variables:

 hclmodule "tf_k8s_monitoring_stack" {
  source  = "solutions.corewide.com/kubernetes/tf-k8s-monitoring-stack/helm"
  version = "~> 3.2.0"

  # specify module inputs here or try one of the examples below
  ...
}

Initialize the setup:

 shellterraform init
Define update strategy

Corewide DevOps team strictly follows Semantic Versioning Specification to provide our clients with products that have predictable upgrades between versions. We recommend pinning patch versions of our modules using pessimistic constraint operator (~>) to prevent breaking changes during upgrades.

To get new features during the upgrades (without breaking compatibility), use ~> 3.2 and run terraform init -upgrade

For the safest setup, use strict pinning with version = "3.2.0"

v3.2.0 released 6 months ago
New version approx. every 8 weeks

Deploy complete stack with only mandatory values:

 hclmodule "monitoring_stack" {
  source  = "solutions.corewide.com/kubernetes/tf-k8s-monitoring-stack/helm"
  version = "~> 3.2"

  grafana = {
    ingress_host = "testmon.example.com"
  }
}

Deploy full stack and an additional datasource.

Set node selectors for already existing Prometheus, add Pushgateway from Prometheus chart, add custom value for Grafana, enable basic authentication and its credentials:

 hclmodule "monitoring_stack" {
  source  = "solutions.corewide.com/kubernetes/tf-k8s-monitoring-stack/helm"
  version = "~> 3.2"

  name_prefix = "dev"

  auth_credentials = {
    password = "XXXX-XXXX-XXXX"
  }

  grafana = {
    ingress_host = "testmon.example.com"
    admin_pass   = "YYYY-YYYY-YYYY"

    node_selector = {
      "cloud\\.google\\.com/gke-nodepool" = "maintenance"
    }

    custom_values = {
      "persistance.type" = "pvc"
    }
  }

  prometheus = {
    node_selector = {
      "cloud\\.google\\.com/gke-nodepool" = "maintenance"
    }
  }

  pushgateway = {
    enabled         = true
    ingress_enabled = true
    ingress_host    = "pushgw.example.com"
    volume_size     = "5Gi"
  }

  loki = {
    node_selector = {
      "cloud\\.google\\.com/gke-nodepool" = "maintenance"
    }
  }

  grafana_datasources = [
    {
      name               = "Prometheus Dev"
      type               = "prometheus"
      url                = "https://devprom.example.com"
      basic_auth_enabled = true
      basic_auth_pass    = "XXXX-XXXX-XXXX"
      basic_auth_user    = "monitoring"
    },
  ]
}

Deploy partial stack with some customization, Prometheus and Node Exporter are disabled, Tempo enabled:

 hclmodule "monitoring_stack" {
  source  = "solutions.corewide.com/kubernetes/tf-k8s-monitoring-stack/helm"
  version = "~> 3.2"

  grafana = {
    enabled      = true
    ingress_host = "testmon.example.com"
    admin_pass   = "YYYY-YYYY-YYYY"

    storage_class = {
      key = "standard"
    }
  }

  prometheus = {
    enabled               = false
    node_exporter_enabled = false
  }

  tempo = {
    enabled = true

    node_selector = {
      "kubernetes\\.azure\\.com/agentpool" = "maintenance"
    }
  }
}
Variable Description Type Default Required Sensitive
grafana Grafana parameters object yes no
name_prefix Name prefix for resources creation string yes no
auth_credentials Ingress Nginx basic auth login credentials object {} no yes
auth_credentials.password Ingress Nginx basic auth login password (will be randomly generated if it's not set) string no yes
auth_credentials.username Ingress Nginx basic auth login username string monitoring no yes
create_namespace Indicates creation of dedicated namespace for monitoring components bool true no no
grafana.admin_pass Grafana admin password (will be randomly generated if it's not set) string no no
grafana.admin_user Grafana admin username string admin no no
grafana.app_version Grafana server version string 11.2.2 no no
grafana.chart_version Helm chart version (compatible chart version is 8.0.0 and newer) string 8.5.6 no no
grafana.custom_values Custom Helm chart values in key value format: "persistance.type" = "pvc" map(any) {} no no
grafana.enabled Toggle Grafana installation bool true no no
grafana.ingress_host Hostname to use with Ingress (required if enabled is true) string no no
grafana.node_selector Node selector to place Grafana pods in map(any) {} no no
grafana.storage_class Storage class name string no no
grafana.volume_size Volume data size string 5Gi no no
grafana_datasources Grafana datasources for datasource provisioning list(object) [] no yes
grafana_datasources[*].basic_auth_enabled Toggle Ingress basic auth bool no yes
grafana_datasources[*].basic_auth_pass Ingress basic auth password string no yes
grafana_datasources[*].basic_auth_user Ingress basic auth user string no yes
grafana_datasources[*].name Name of the datasource string no yes
grafana_datasources[*].type Type of the datasource string no yes
grafana_datasources[*].url URL of the datasource string no yes
ingress_cert_issuer Ingress TLS certificate issuer string letsencrypt no no
ingress_class Ingress Class definition string nginx no no
loki Loki parameters object {} no no
loki.app_version Loki server version string 3.2.0 no no
loki.bind_memberlist_endpoint Toggle explicit bind of POD IP to Loki-Memberlist Kubernetes service. Required only for deployment into EKS cluster in order to bind endpoint IP accurately bool false no no
loki.chart_version Helm chart version (compatible chart version is 6.0.0 and newer) string 6.18.0 no no
loki.custom_values Custom Helm chart values in key value format: "persistance.enabled" = true map(any) {} no no
loki.enabled Toggle Loki installation bool true no no
loki.eventrouter_app_version Eventrouter app version string 1.7.0 no no
loki.eventrouter_chart_version Eventrouter Helm chart version (compatible chart version is 3.0.0 and newer) string 3.2.14 no no
loki.eventrouter_enabled Toggle Eventrouter installation bool true no no
loki.ingress_auth_enabled Toggle Ingress basic auth (effective if ingress_enabled is true) bool false no no
loki.ingress_enabled Toggle Ingress bool false no no
loki.ingress_host Hostname to use with Ingress (effective if ingress_enabled is true) string no no
loki.node_selector Node selector to place Loki pods in map(any) {} no no
loki.promtail_app_version Promtail app version string 3.0.0 no no
loki.promtail_chart_version Promtail Helm chart version (compatible chart version is 6.0.0 and newer) string 6.16.6 no no
loki.promtail_enabled Toggle Promtail installation bool true no no
loki.retention_period Data retention period string 93d no no
loki.storage_class Storage class name string no no
loki.volume_size Volume data size string 100Gi no no
namespace Monitoring stack namespace string monitoring no no
prometheus Prometheus parameters object {} no no
prometheus.alertmanager_enabled Toggle Prometheus alertmanager installation bool false no no
prometheus.app_version Prometheus server version string v2.39.1 no no
prometheus.chart_version Helm chart version (compatible chart version must be from 18.0.0 and up to 19.0.0) string 18.4.0 no no
prometheus.custom_values Custom Helm chart values in key value format: "persistentVolume.enabled" = true map(any) {} no no
prometheus.enabled Toggle Prometheus installation bool true no no
prometheus.ingress_auth_enabled Toggle Nginx basic auth (effective if ingress_enabled is true) bool false no no
prometheus.ingress_enabled Toggle Ingress bool false no no
prometheus.ingress_host Hostname to use with Ingress (effective if ingress_enabled is true) string no no
prometheus.node_exporter_enabled Toggle Node exporter installation bool true no no
prometheus.node_selector Node selector to place Prometheus pods in map(any) {} no no
prometheus.retention_period Data retention period string 93d no no
prometheus.storage_class Storage class name string no no
prometheus.volume_size Volume data size string 100Gi no no
pushgateway Pushgateway parameters object {} no no
pushgateway.app_version Pushgateway server version string v1.4.3 no no
pushgateway.enabled Toggle Pushgateway installation bool false no no
pushgateway.ingress_auth_enabled Toggle Nginx basic auth (effective if ingress_enabled is true) bool false no no
pushgateway.ingress_enabled Toggle Ingress bool false no no
pushgateway.ingress_host Hostname to use with Ingress (effective if ingress_enabled is true) string no no
pushgateway.volume_size Volume data size string 2Gi no no
tempo Tempo parameters object {} no no
tempo.app_version Tempo components version string 2.6.0 no no
tempo.azure_remote_storage Azure blob storage for Tempo data object no no
tempo.azure_remote_storage.container_name Storage account container name string no no
tempo.azure_remote_storage.storage_account_key Storage account key string no no
tempo.azure_remote_storage.storage_account_name Storage account name string no no
tempo.chart_version Tempo distributed Helm chart version (compatible chart version is 1.0.0 and newer) string 1.20.0 no no
tempo.custom_values Custom Helm chart values in key value format: "persistance.type" = "pvc" map(any) {} no no
tempo.enabled Toggle Tempo installation bool false no no
tempo.ingress_auth_enabled Toggle Nginx basic auth (effective if ingress_enabled is true) bool false no no
tempo.ingress_enabled Toggle Ingress bool false no no
tempo.ingress_host Hostname to use with Ingress (effective if ingress_enabled is true) string no no
tempo.node_selector Node selector to place Tempo components in map(any) {} no no
tempo.span_end_time_shift Shifts the end time for the logs query, based on the span's end time string -1h no no
tempo.span_start_time_shift Shifts the start time for the logs query, based on the span's start time string 1h no no
tempo.traces_tags Define additional tags when provisioning traces to logs feature map(any) {} no no
tempo.traces_to_logs Toggle traces to logs feature bool false no no
Output Description Type Sensitive
basic_auth_credentials Ingress basic auth credentials computed yes
grafana_admin_credentials Contains admin user credentials for Grafana web UI map yes
ingress_hosts Ingress exposed hosts map no
Dependency Version Kind
terraform >= 1.3 CLI
hashicorp/helm ~> 2.5 provider
hashicorp/kubernetes ~> 2.9 provider
hashicorp/random ~> 3.3 provider

Not sure where to start?
Let's find your perfect match.