
tf-k8s-crd | $50 |
tf-k8s-grafana | $300 |
Module installs the monitoring stack where components can be disabled or customized.
The default setup deploys a complete set with a minimal configuration - Grafana is exposed via Ingress, Loki and Prometheus are available only inside the cluster, Tempo disabled, persistent storage for all the components configured, main exporters are enabled.
Tempo works in distributed (microservice) mode.
Tempo core components are: compactor
, distributor
, ingester
, querier
, query-frontend
, memcached
.
Grafana and Prometheus use Recreate
update strategy type that causes short downtime between deleting old pod and creating a new one to properly re-attach volumes.
All main components expect Nginx as Ingress class as dependency for this module.
If Prometheus or Loki are marked as enabled, corresponding local datasources for Grafana will be created.
Pushgateway installed within Prometheus
helm chart and disabled by default. Use pushgateway
values from variables table to install and configure it. The Prometheus Pushgateway exists to allow ephemeral and batch jobs to expose their metrics to Prometheus.
Nginx is used as ingress_class
in Ingress Annotations by default for all main monitoring-stack components. If custom ingress_class
and ingress_auth_enabled
are used, specific auth Annotations must be provided through loki.custom_values
and prometheus.custom_values
.
bashfailed: failed to create memberlist: Failed to get final advertise address: no private IP address found, and explicit IP not provided
Once you have a Corewide Solutions Portal account, this one-time action will use your browser session to retrieve credentials:
shellterraform login solutions.corewide.com
Initialize mandatory providers:
Copy and paste into your Terraform configuration and insert the variables:
hclmodule "tf_k8s_monitoring_stack" {
source = "solutions.corewide.com/kubernetes/tf-k8s-monitoring-stack/helm"
version = "~> 4.0.1"
# specify module inputs here or try one of the examples below
...
}
Initialize the setup:
shellterraform init
Corewide DevOps team strictly follows Semantic Versioning
Specification
to
provide our clients with products that have predictable upgrades between versions. We
recommend
pinning
patch versions of our modules using pessimistic
constraint operator (~>
) to prevent breaking changes during upgrades.
To get new features during the upgrades (without breaking compatibility), use
~> 4.0
and run
terraform init -upgrade
For the safest setup, use strict pinning with version = "4.0.1"
All notable changes to this project are documented here.
The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.
BREAKING CHANGE: Grafana roll-out is changed from the Helm-based approach to Grafana Operator, thus its settings and data management aren't compatible with previous version
tf-k8s-crd
module dependencytf-k8s-grafana
module dependencydelete_request_store
parameter in compactor serviceretention_enabled
and retention_period
parameters in compactor
serviceGrafana
Helm chart (from 6.50.7
to 8.5.6
) and application (from 9.5.1
to 11.2.2
) versions, used new valuesPrometheus
Helm chart (from 15.8.7
to 18.4.0
) and application (from v2.34.0
to v2.39.1
) versions, used new valuesLoki
Helm chart (from 5.10.0
to 6.18.0
) and application (from 2.8.3
to 3.2.0
) versions, used new valuesPromtail
Helm chart (from 2.1.0
to 6.16.6
) and application (from 2.5.0
to 3.0.0
) versions, used new valuesEventrouter
Helm chart (from 1.4.15
to 3.2.14
) and application (from 0.11.0
to 1.7.0
) versions, used new valuesTempo
Helm chart (from 1.7.0
to 1.20.0
) and application (from 2.3.0
to 2.6.0
) versions, used new valuesTempo
serviceBREAKING CHANGE: Loki settings and data chunks management aren't compatible with previous version
Loki
Helm chart source repository, chart (from 2.13.3
to 5.10.0
) and application (from 2.6.1
to 2.8.3
) versions, used new valuesPromtail
Helm chart source repository and version (from 2.0.2
to 2.1.0
)PodSecurityPolicy
Kubernetes resource creation for Grafana
Helm release in order to keep compatibility with Kubernetes v1.25
and newer9.5.1
BREAKING CHANGE: now all kubernetes
provider resources use versioned resources which aren't compatible with previous version
grafana
application version to 9.3.6
grafana
helm chart version to 6.50.7
basic_auth
credentials with special characters by disabling special charactersprometheus.alertmanager_enabled
, off by default)v1.x
to v2.x
Now all kubernetes
provider resources use versioned resources. According to kubernetes provider's suggestions
the simplest, non-destructive way to do this is to remove the old resource from state and import this resource as a version one, like so:
bash# If Kubernetes namespace was managed by the module, it must be re-imported
terraform state rm module.monitoring.kubernetes_namespace.monitoring[0]
terraform import module.monitoring.kubernetes_namespace_v1.monitoring[0] monitoring
# If Kubernetes secret with basic auth credentials was created, it must be re-imported
terraform state rm module.monitoring.kubernetes_secret.ingress_basic_auth
terraform import module.monitoring.kubernetes_secret_v1.ingress_basic_auth monitoring/ingress-monitoring-basic-auth
# Re-import Cluster Role Binding for the Event Router
terraform state rm module.monitoring.kubernetes_cluster_role_binding.eventrouter
terraform import module.monitoring.kubernetes_cluster_role_binding_v1.eventrouter eventrouter
v2.x
to v3.x
The module from v3.0
has changed the used Chart repo and version for the Loki solution. Since Loki is deployed as a StatefulSet, its spec can't be updated. That's why Loki must be redeployed manually. First, update the reference of the module version and re-init the module, then uninstall an already deployed Loki Helm Chart:
bashterraform destroy -target 'module.monitoring.helm_release.loki[0]'
bashhelm -n monitoring uninstall loki
Then, the Loki Helm Chart can be re-installed:
bashterraform apply -target 'module.monitoring.helm_release.loki[0]'
v3.x
to v4.x
The module from v4.0
utilizes Grafana Operator instead of a Helm-based approach, thus Grafana will be completely re-deployed, and the attached volume with all the managed data (users, dashboards, alert rules, etc.) will be lost.
To upgrade, update the module version reference, re-init the module, update module inputs following the documentation, and apply changes.
Deploy complete stack with only mandatory values:
hclmodule "monitoring_stack" {
source = "solutions.corewide.com/kubernetes/tf-k8s-monitoring-stack/helm"
version = "~> 4.0"
grafana = {
ingress_host = "testmon.example.com"
}
}
Deploy full stack and an additional datasource.
Set node selectors for already existing Prometheus, add Pushgateway from Prometheus chart, add custom value for Grafana, enable basic authentication and its credentials:
hclmodule "monitoring_stack" {
source = "solutions.corewide.com/kubernetes/tf-k8s-monitoring-stack/helm"
version = "~> 4.0"
name_prefix = "dev"
auth_credentials = {
password = "XXXX-XXXX-XXXX"
}
grafana = {
ingress_host = "testmon.example.com"
admin_pass = "YYYY-YYYY-YYYY"
node_selector = {
"cloud\\.google\\.com/gke-nodepool" = "maintenance"
}
env_vars = {
GF_PLUGIN_GRAFANA_IMAGE_RENDERER_RENDERING_IGNORE_HTTPS_ERRORS = true
}
}
prometheus = {
node_selector = {
"cloud\\.google\\.com/gke-nodepool" = "maintenance"
}
}
pushgateway = {
enabled = true
ingress_enabled = true
ingress_host = "pushgw.example.com"
volume_size = "5Gi"
}
loki = {
node_selector = {
"cloud\\.google\\.com/gke-nodepool" = "maintenance"
}
}
grafana_datasources = [
{
name = "Prometheus Dev"
type = "prometheus"
url = "https://devprom.example.com"
basic_auth_enabled = true
basic_auth_pass = "XXXX-XXXX-XXXX"
basic_auth_user = "monitoring"
},
]
}
Deploy partial stack with some customization, Prometheus and Node Exporter are disabled, Tempo enabled:
hclmodule "monitoring_stack" {
source = "solutions.corewide.com/kubernetes/tf-k8s-monitoring-stack/helm"
version = "~> 3.2"
grafana = {
ingress_host = "testmon.example.com"
admin_pass = "YYYY-YYYY-YYYY"
storage_class = "standard"
}
prometheus = {
enabled = false
node_exporter_enabled = false
}
tempo = {
enabled = true
node_selector = {
"kubernetes\\.azure\\.com/agentpool" = "maintenance"
}
}
}
Variable | Description | Type | Default | Required | Sensitive |
---|---|---|---|---|---|
grafana |
Grafana parameters | object |
yes | no | |
auth_credentials |
Ingress Nginx basic auth login credentials | object |
{} |
no | yes |
auth_credentials.password |
Ingress Nginx basic auth login password (will be randomly generated if it's not set) | string |
no | yes | |
auth_credentials.username |
Ingress Nginx basic auth login username | string |
monitoring |
no | yes |
create_namespace |
Indicates creation of dedicated namespace for monitoring components | bool |
true |
no | no |
grafana.admin_pass |
Grafana admin password (will be randomly generated if it's not set) | string |
no | no | |
grafana.admin_user |
Grafana admin username | string |
admin |
no | no |
grafana.enabled |
Toggle Grafana installation | bool |
true |
no | no |
grafana.env_vars |
Environment variables for Grafana container in key-value format | map(any) |
{} |
no | no |
grafana.grafana_version |
Grafana server version | string |
11.2.2 |
no | no |
grafana.ingress_host |
Hostname to use with Ingress (required if enabled is true ) |
string |
no | no | |
grafana.log_level |
Grafana log level (Supported levels: trace , debug , info , warn , error or critical ) |
string |
warn |
no | no |
grafana.node_selector |
Node selector to place Grafana pods in | map(any) |
{} |
no | no |
grafana.operator_app_version |
Grafana operator image version | string |
v5.9.2 |
no | no |
grafana.operator_chart_version |
Grafana operator Helm chart version | string |
v5.9.2 |
no | no |
grafana.recreate_on_changes |
Whether the Grafana CRD should be recreated and not updated during apply phase |
bool |
false |
no | no |
grafana.storage_class |
Storage class name | string |
no | no | |
grafana.volume_size |
Volume data size | string |
5Gi |
no | no |
grafana_datasources |
Grafana datasources for datasource provisioning | list(object) |
[] |
no | yes |
grafana_datasources[*].basic_auth_enabled |
Toggle Ingress basic auth | bool |
no | yes | |
grafana_datasources[*].basic_auth_pass |
Ingress basic auth password | string |
no | yes | |
grafana_datasources[*].basic_auth_user |
Ingress basic auth user | string |
no | yes | |
grafana_datasources[*].name |
Name of the datasource | string |
no | yes | |
grafana_datasources[*].type |
Type of the datasource | string |
no | yes | |
grafana_datasources[*].url |
URL of the datasource | string |
no | yes | |
ingress_cert_issuer |
Ingress TLS certificate issuer | string |
letsencrypt |
no | no |
ingress_class |
Ingress Class definition | string |
nginx |
no | no |
loki |
Loki parameters | object |
{} |
no | no |
loki.app_version |
Loki server version | string |
3.2.0 |
no | no |
loki.bind_memberlist_endpoint |
Toggle explicit bind of POD IP to Loki-Memberlist Kubernetes service. Required only for deployment into EKS cluster in order to bind endpoint IP accurately |
bool |
false |
no | no |
loki.chart_version |
Helm chart version (compatible chart version is 6.0.0 and newer) |
string |
6.18.0 |
no | no |
loki.custom_values |
Custom Helm chart values in key value format: "persistance.enabled" = true |
map(any) |
{} |
no | no |
loki.enabled |
Toggle Loki installation | bool |
true |
no | no |
loki.eventrouter_app_version |
Eventrouter app version | string |
1.7.0 |
no | no |
loki.eventrouter_chart_version |
Eventrouter Helm chart version (compatible chart version is 3.0.0 and newer) |
string |
3.2.14 |
no | no |
loki.eventrouter_enabled |
Toggle Eventrouter installation | bool |
true |
no | no |
loki.ingress_auth_enabled |
Toggle Ingress basic auth (effective if ingress_enabled is true ) |
bool |
false |
no | no |
loki.ingress_enabled |
Toggle Ingress | bool |
false |
no | no |
loki.ingress_host |
Hostname to use with Ingress (effective if ingress_enabled is true ) |
string |
no | no | |
loki.node_selector |
Node selector to place Loki pods in | map(any) |
{} |
no | no |
loki.promtail_app_version |
Promtail app version | string |
3.0.0 |
no | no |
loki.promtail_chart_version |
Promtail Helm chart version (compatible chart version is 6.0.0 and newer) |
string |
6.16.6 |
no | no |
loki.promtail_enabled |
Toggle Promtail installation | bool |
true |
no | no |
loki.retention_period |
Data retention period | string |
93d |
no | no |
loki.storage_class |
Storage class name | string |
no | no | |
loki.volume_size |
Volume data size | string |
100Gi |
no | no |
name_prefix |
Name prefix for resources creation | string |
no | no | |
namespace |
Monitoring stack namespace | string |
monitoring |
no | no |
prometheus |
Prometheus parameters | object |
{} |
no | no |
prometheus.alertmanager_enabled |
Toggle Prometheus alertmanager installation | bool |
false |
no | no |
prometheus.app_version |
Prometheus server version | string |
v2.39.1 |
no | no |
prometheus.chart_version |
Helm chart version (compatible chart version must be from 18.0.0 and up to 19.0.0 ) |
string |
18.4.0 |
no | no |
prometheus.custom_values |
Custom Helm chart values in key value format: "persistentVolume.enabled" = true |
map(any) |
{} |
no | no |
prometheus.enabled |
Toggle Prometheus installation | bool |
true |
no | no |
prometheus.ingress_auth_enabled |
Toggle Nginx basic auth (effective if ingress_enabled is true ) |
bool |
false |
no | no |
prometheus.ingress_enabled |
Toggle Ingress | bool |
false |
no | no |
prometheus.ingress_host |
Hostname to use with Ingress (effective if ingress_enabled is true ) |
string |
no | no | |
prometheus.node_exporter_enabled |
Toggle Node exporter installation | bool |
true |
no | no |
prometheus.node_selector |
Node selector to place Prometheus pods in | map(any) |
{} |
no | no |
prometheus.retention_period |
Data retention period | string |
93d |
no | no |
prometheus.storage_class |
Storage class name | string |
no | no | |
prometheus.volume_size |
Volume data size | string |
100Gi |
no | no |
pushgateway |
Pushgateway parameters | object |
{} |
no | no |
pushgateway.app_version |
Pushgateway server version | string |
v1.4.3 |
no | no |
pushgateway.enabled |
Toggle Pushgateway installation | bool |
false |
no | no |
pushgateway.ingress_auth_enabled |
Toggle Nginx basic auth (effective if ingress_enabled is true ) |
bool |
false |
no | no |
pushgateway.ingress_enabled |
Toggle Ingress | bool |
false |
no | no |
pushgateway.ingress_host |
Hostname to use with Ingress (effective if ingress_enabled is true ) |
string |
no | no | |
pushgateway.volume_size |
Volume data size | string |
2Gi |
no | no |
tempo |
Tempo parameters | object |
{} |
no | no |
tempo.app_version |
Tempo components version | string |
2.6.0 |
no | no |
tempo.azure_remote_storage |
Azure blob storage for Tempo data | object |
no | no | |
tempo.azure_remote_storage.container_name |
Storage account container name | string |
no | no | |
tempo.azure_remote_storage.storage_account_key |
Storage account key | string |
no | no | |
tempo.azure_remote_storage.storage_account_name |
Storage account name | string |
no | no | |
tempo.chart_version |
Tempo distributed Helm chart version (compatible chart version is 1.0.0 and newer) |
string |
1.20.0 |
no | no |
tempo.custom_values |
Custom Helm chart values in key value format: "persistance.type" = "pvc" |
map(any) |
{} |
no | no |
tempo.enabled |
Toggle Tempo installation | bool |
false |
no | no |
tempo.ingress_auth_enabled |
Toggle Nginx basic auth (effective if ingress_enabled is true ) |
bool |
false |
no | no |
tempo.ingress_enabled |
Toggle Ingress | bool |
false |
no | no |
tempo.ingress_host |
Hostname to use with Ingress (effective if ingress_enabled is true ) |
string |
no | no | |
tempo.node_selector |
Node selector to place Tempo components in | map(any) |
{} |
no | no |
tempo.span_end_time_shift |
Shifts the end time for the logs query, based on the span's end time | string |
-1h |
no | no |
tempo.span_start_time_shift |
Shifts the start time for the logs query, based on the span's start time | string |
1h |
no | no |
tempo.traces_tags |
Define additional tags when provisioning traces to logs feature | map(any) |
{} |
no | no |
tempo.traces_to_logs |
Toggle traces to logs feature | bool |
false |
no | no |
Output | Description | Type | Sensitive |
---|---|---|---|
basic_auth_credentials |
Ingress basic auth credentials | computed |
yes |
grafana_admin_credentials |
Contains admin user credentials for Grafana web UI | map |
yes |
ingress_hosts |
Ingress exposed hosts | map |
no |
Dependency | Version | Kind |
---|---|---|
terraform |
>= 1.3 |
CLI |
hashicorp/helm |
~> 2.9 |
provider |
hashicorp/kubernetes |
~> 2.22 |
provider |
hashicorp/random |
~> 3.3 |
provider |
tf-k8s-crd |
~> 2.0 |
module |
tf-k8s-grafana |
~> 1.1 |
module |