Skip to content

Postgres

Cloud Control includes a fully managed Postgres database. This is the default database backend for the following components:

Component Userplane Controlplane
Dynamic Filtering yes

Postgres Cluster

Cloud Control Postgres clusters are deployed with 3 nodes by default, one of which will be designated as primary and the others as replica. The primary node will support both reads & writes, while the replica nodes will only be available for reads. Determination of which node in the cluster is the primary is managed by Patroni. Patroni also makes sure all nodes designated as replica are configured to receive streaming replication from the primary.

Access to the postgres cluster is enabled via 2 services, corresponding to the role of the nodes:

  • Primary Service: Exposes the node currently designated as primary
  • Replica Service: Exposes the nodes currently designated as replica

To ensure the Postgres cluster is managed and monitored, each Postgres Pod contains several containers:

  • Postgres: Patroni runs in this container, managing the Postgres process and coordinating with Patroni in the other cluster nodes to determine which node is the primary
  • Postgres Exporter: Collects Prometheus metrics from Postgres database
  • Cloud Control Agent: Responsible for managing the Patroni configuration, exposing metrics and enabling management activities via NATS (such as backup & restore)

This results in a deployment as follows:

Image title
Postgres cluster

Automated failover

To ensure the Postgres cluster can recover from failures, it always consists of at least 3 nodes. This allows us to ensure that if 1 of the nodes fails or runs on a Kubernetes node undergoing maintenance we can recover and have at least 1 primary and replica left running. The process for failing over is fully managed by Patroni and is best explained using diagrams. In the below scenario, Pod #1 is currently our primary, but it fails:

Image title
Postgres cluster with failure on primary

Once Patroni notices the current primary is no longer in a healthy state, a new primary is elected by the remaining nodes and the Service objects and replication are reconfigured accordingly. For example, the cluster will now look as follows:

Image title
Postgres cluster primary switched to Pod #2

When the cause of the failure of Pod #1 has been determined and resolved, it can rejoin the cluster. It will do so as replica, since another node already took over the role of primary. The cluster will now look like this:

Image title
Postgres cluster after recovery of Pod #1

Data Persistency

Persistency of data stored in the postgres cluster is supported by 3 mechanisms:

  • replica nodes are constantly replicating from the primary to ensure a primary failure can be mitigated without data loss
  • Pods are deployed as part of a StatefulSet with a PersistentVolume on which the Postgres data is stored to ensure the data can withstand a restart of the Pod
  • (Optional) Backups to an S3 bucket with support for point-in-time recovery via the Cloud Control API

Backup & Restore

If available, you can configure the Postgres cluster to send full backups and Write-Ahead Logging (WAL) files to an S3 bucket. Using the Cloud Control API you can then perform point-in-time recovery if necessary. The primary tooling between Postgres and the S3 bucket used for these activities is WAL-G.

When backups are enabled, WAL-G will ship full backups and WAL files from the primary node to the S3 bucket:

Image title
Postgres cluster with backup & restore enabled

To be able to configure the frequency of full backups it is important to understand how a restore activity works. The process for Cloud Control Postgres follows these steps:

  1. Administrator issues restore request via Cloud Control API and provides a desired target time to restore to
  2. Cloud Control determines if there is a full backup available taken prior to the desired target time
    • If available, Cloud Control selects the latest full backup that was taken prior to the target time
    • If not available, Cloud Control will select the earliest created full backup available
  3. Cloud Control shuts down Patroni & Postgres on all nodes in the cluster
  4. Cloud Control removes the current data on the nodes in the cluster
  5. Cloud Control configures Patroni on a single node (typically the -0 Pod) to restore from the selected full backup and (if available) to the target time using WAL files
  6. Cloud Control starts the newly designated primary node to perform the restore
  7. Cloud Control starts the remaining nodes to resynchronize them from the new primary node, this happens in sequence as to not overload the primary and these all become replica nodes

To manage the backups which are available, several mechanisms are available:

  • Cloud Control automatically makes a full backup:
    • When the cluster is initially created
    • After a restore activity has completed
    • When an event occurs that causes a replica node to become the new primary
    • Based on a schedule, configurable in cron format
    • Manually, via the Cloud Control API
  • Cloud Control Janitor runs on a schedule, also configurable in cron format, which deletes all backups & WAL files older than a configurable number of days

When configuring the backup generation & retention parameters, take the following into account:

  • You can only restore to a point-in-time that is as old or more recent than the oldest available full backup
  • You can only perform point-in-time recovery from a full backup + WAL files if you have both the full backup and accompanying WAL files available
  • WAL files storage usage is primarily determined by:
    • Number of days for which they are retained
    • Frequency of mutations to the dataset
  • Full backups storage usage is primarily determined by:
    • Number of days for which they are retained
    • Frequency at which they are generated
    • Amount of data stored in the database

Info: S3 Object Versioning

S3 providers often have functionality to apply versioning to the items stored in buckets. The Postgres backup functionality does not utilize this functionality.

If you enable object versioning on a bucket used for postgres backups without an appropriate lifecycle policy, you will potentially:

  • Use more storage than required (deleted objects linger)
  • Run into performance bottlenecks (more objects in bucket = lower performance on some S3 providers)
  • Incur higher operating costs (mainly an issue if S3 provider is a public cloud service)

Warning: Do not configure multiple Postgres clusters with the same bucket + prefix

Each Postgres cluster with backups enabled should have a dedicated bucket + prefix combination. If you configure multiple Postgres clusters to use the same bucket + prefix you will likely have conflicting timelines and backups will not be able to be restored properly

Configuration Reference

Postgres clusters can be configured as nested items of the Cloud Control component which requires it. For example, to configure the Postgres cluster deployed as part of the Dynamic Filtering deployment:

dynamic:
  postgres:
    backup:
      enabled: true

      # required:
      bucket: mybucket
      prefix: pgbackups
      endpoint: https://s3.eu-central-1.amazonaws.com
      secretName: my-s3-credentials
      region: "eu-central-1"

    parameters:
      max_wal_size: 100
      min_wal_size: 40

    resources:
      limits:
        cpu: 4
        memory: 4Gi

    pvc:
      template:
        resources:
          requests:
            storage: 80Gi

Above postgres cluster will have:

  • S3 backups enabled
  • max_wal_size and min_wal_size Postgres parameters adjusted
  • Postgres container resource limits set at 4 CPUs and 4 Gi RAM
  • Persistent volumes for each Pod with a capacity of 80 Gi

The full list of parameters which can be used under the postgres node in the above YAML are as follows:

Parameter Type Default Description
affinity k8s:Affinity pod affinity (Kubernetes docs: Affinity and anti-affinity). If unset, a default anti-affinity is applied using antiAffinityPreset to spread pods across nodes
antiAffinityPreset string "required" pod anti affinity preset.
Available options: "preferred" "required"
agentHeartbeat integer 60 How often the agent should attempt to synchronize the Postgres database in seconds.
This process ensures the database user credentials are kept in sync with the supplied Secrets
agentLogLevel string "info" Verbosity of logging for the agent container.
Available options: "debug" "info" "warn" "error"
backup Backup
enabled: false
Configuration of backups, requires an S3 bucket to be available
hostNetwork boolean false Use host networking for pods
logLevel string "info" Level of logging.
Available options: "debug" "info" "warning" "error"
nodeSelector k8s:NodeSelector {} Kubernetes pod nodeSelector
parameters dictionary {} Set of key:value pairs to override Postgres parameters
podAnnotations k8s:Annotations {} Annotations to be added to each pod
podLabels k8s:Labels {} Labels to be added to each pod
podSecurityContext k8s:PodSecurityContext
fsGroup: 953
runAsUser: 953
runAsGroup: 953
runAsNonRoot: true
SecurityContext applied to each pod
primaryService Service
type: ClusterIP
Configuration of the Service exposing the primary node
pvc Persistent Volume
labels: {}
annotations: {}
template:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 20Gi
Configuration of the Volume Claim template to request your storage provisioner to provide the Postgres Pods with appropriate persistent storage volumes
replicas integer 3 Default number of replicas in the StatefulSet
replicaService Service
type: ClusterIP
Configuration of the Service exposing the replica node(s)
replicationUser User Credentials {} Credentials for the replication user named repluser
resources k8s:Resources
limits:
cpu: 2
memory: 4Gi
Resources allocated to the postgres container if resourceDefaults (global) is true
superUser User Credentials {} Credentials for the superuser named postgres
tls TLS
enabled: false
TLS configuration for inbound Postgres traffic
tolerations List of k8s:Tolerations [] Kubernetes pod Tolerations
topologySpreadConstraints List of k8s:TopologySpreadConstraint [] Kubernetes pod topology spread constraints

Backup

Parameters to configure automated backups if you have an S3 bucket available. For example:

<parent>:
  postgres:
    backup:
      enabled: true

      bucket: mybucket
      prefix: pgbackups
      endpoint: https://s3.eu-central-1.amazonaws.com
      secretName: my-s3-credentials
      region: "eu-central-1"
Parameter Type Required Default Description
archiveTimeout integer 300 Amount of seconds after which a new WAL file is generated and pushed to the S3 bucket if there has been at least 1 change to the database. On a cluster with low frequency of writes this helps ensure you can always restore to at least archiveTimeout seconds in the past
bucket string yes Name of the S3 bucket
enabled boolean yes false If true, enable the backup & restore functionality
endpoint string yes Endpoint of the S3 service to connect to
extra dictionary {} Key:value pairs of extra configuration parameters to be added to the WAL-G configuration file used by all backup & restore activities. Only use this if you are aware of potential consequences of the additional configuration items
janitorSchedule cron "0 4 * * *" Schedule in cron format to run the Janitor, responsible for deleting backups older than retentionDays days.
Defaults to daily at 04:00 (AM)
prefix string If configured, all backup & restore actions will be performed on the S3 bucket with this prefix applied to all files.
region string "us-east-1" Region of the S3 bucket, some S3 providers require this (AWS for example) and some (such as MinIO) may not require this to be configured
retentionDays integer 60 Number of days to keep backups in S3. Older backups are deleted by the Janitor
schedule cron "0 2 * * 0" Schedule in cron to create full backups. Every time this schedule is activated a full backup will be created.
Defaults to every sunday at 02:00 (AM)
secretAccessKey string "access_key" Name of the item inside the secretName Secret which holds the access key
secretName string yes Name of a pre-existing Secret from which to obtain S3 access & secret keys
secretSecretKey string "secret_key" Name of the item inside the secretName Secret which holds the secret key

Persistent Volume

To configure the Persistent Volume requested for each Postgres Pod you can modify the following parameters inside the pvc node:

<parent>:
  postgres:
    pvc:
      labels:
        my.prefix/labelname: labelvalue
      annotations:
        my.prefix/annoname: annovalue
      template:
        resources:
          requests:
            storage: 40Gi

These parameters can be configured as follows:

Parameter Type Default Description
annotations k8s:Annotations {} Annotations to be added to the StatefulSet's volume claim template
template k8s:[[PersistentVolumeClaimSpec]]{target="_blank"} {} Spec of PersistentVolumeClaim. Likely needs to be configured to ensure it complies with the expectations of the storage providers on your Kubernetes cluster. In addition, ensure the requested amount of storage is large enough for your usecase.
labels k8s:Labels {} Labels to be added to the StatefulSet's volume claim template

Service

Parameters to configure the service objects. For example:

<parent>:
  postgres:
    primaryService:
      type: LoadBalancer
      annotations:
        metallb.universe.tf/address-pool: name_of_pool

    replicaService:
      type: LoadBalancer
      annotations:
        metallb.universe.tf/address-pool: name_of_pool
Parameter Type Default Description
allocateLoadBalancerNodePorts boolean true If true, services with type LoadBalancer automatically assign NodePorts. Can be set to false if the LoadBalancer provider does not rely on NodePorts
annotations k8s:Annotations {} Annotations for the service
clusterIP string Static cluster IP, must be in the cluster's range of cluster IPs and not in use. Randomly assigned when not specified.
clusterIPs List of string List of static cluster IPs, must be in the cluster's range of cluster IPs and not in use.
externalIPs List of string List of IP addresses for which nodes in the cluster will also accept traffic for this service. These IPs are not managed by Kubernetes and must be user-defined on the cluster's nodes
externalTrafficPolicy string Cluster Can be set to Local to let nodes distribute traffic received on one of the externally-facing addresses (NodePort and LoadBalancer) solely to endpoints on the node itself
healthCheckNodePort integer For services with type LoadBalancer and externalTrafficPolicy Local you can configure this value to choose a static port for the NodePort which external systems (LoadBalancer provider mainly) can use to determine which node holds endpoints for this service
internalTrafficPolicy string Cluster Can be set to Local to let nodes distribute traffic received on the ClusterIP solely to endpoints on the node itself
ipv4 boolean false If true, force the Service to include support for IPv4, ignoring globally configured IP Family settings and/or cluster defaults. If ipv4 is set to true and ipv6 remains false, the result will be an ipv4-only SingleStack Service. If both are false, global settings and/or cluster defaults are used. If both are true, a PreferDualStack Service is created
ipv6 boolean false If true, force the Service to include support for IPv6, ignoring globally configured IP Family settings and/or cluster defaults. If ipv6 is set to true and ipv4 remains false, the result will be an ipv6-only SingleStack Service. If both are false, global settings and/or cluster defaults are used. If both are true, a PreferDualStack Service is created
labels k8s:Labels {} Labels to be added to the service
loadBalancerIP string Deprecated Kubernetes feature, available for backwards compatibility: IP address to attempt to claim for use by this LoadBalancer. Replaced by annotations specific to each LoadBalancer provider
loadBalancerSourceRanges List of string If supported by the LoadBalancer provider, restrict traffic to this LoadBalancer to these ranges
loadBalancerClass string Used to select a non-default type of LoadBalancer class to ensure the appropriate LoadBalancer provisioner attempt to manage this LoadBalancer service
publishNotReadyAddresses boolean false Service is populated with endpoints regardless of readiness state
sessionAffinity string None Can be set to ClientIP to attempt to maintain session affinity.
sessionAffinityConfig k8s:SessionAffinityConfig {} Configuration of session affinity
type string ClusterIP Type of service.
Available options: "ClusterIP" "LoadBalancer" "NodePort"

TLS

Parameters to configure TLS for inbound traffic. An example:

<parent>:
  postgres:
    tls:
      enabled: true
      certSecretName: my-cluster-certificate

In the above example the certificate present in Secret my-cluster-certificate will be attempted to be used to start a TLS-enabled listener.

Parameter Type Default Description
certSecretName string Name of a Secret object containing a certificate (must contain the tls.key, tls.crt items)
certManager boolean false Toggle to have a request created for Certmanager to provision a certificate.
By default, this will request for a Certificate covering the following:
- postgres-[Name of cluster]-primary
- postgres-[Name of cluster]-primary.[Namespace]
- postgres-[Name of cluster]-primary.[Namespace].svc
- postgres-[Name of cluster]-replica
- postgres-[Name of cluster]-replica.[Namespace]
- postgres-[Name of cluster]-replica.[Namespace].svc
Additional entries can be configured using extraDNSNames
enabled boolean false Toggle to enable TLS
If set to true, a certSecretName must be set or certManager must be set to true to ensure a valid certificate is available
extraDNSNames List of string [] List of additional entries to be added to the Certificate requested from Certmanager
issuerGroup string "cert-manager.io" Group to which issuer specified under issuerKind belongs
Default value is inherited from the global certManager configuration
issuerKind string "ClusterIssuer" Type of Certmanager issuer to request a Certificate from
Default value is inherited from the global certManager configuration
issuerName string "" Name of the issuer from which to request a Certificate
Default value is inherited from the global certManager configuration
certSpecExtra CertificateSpec {} Extra configuration to be injected into the Certmanager Certificate object's spec field.
Disallowed options: "secretName" "commonName" "dnsNames" "issuerRef" (These are configured automatically and/or via other options)
certLabels k8s:Labels {} Extra labels for the Certmanager Certificate object
certAnnotations k8s:Annotations {} Extra annotations for the Certmanager Certificate object

User Credentials

Parameters to configure user credentials. If these are not provided for a user, a password is generated and stored in a Secret named postgres-[Name of cluster]-user-[Username].

For example, to force the superuser (named postgres) to have a password based on the password item in the secret named pg-superuser-password:

<parent>:
  postgres:
    superUser:
      passwordSecretName: "pg-superuser-password"
Parameter Type Default Description
passwordSecretName string Name of a Secret object containing a password for the user
passwordSecretKey string password Name of the item inside the Secret holding the password