Documentation PortalBack to Self Assist PortalBack
Documentation Portal
Contents

Crunchy PostgreSQL DB Stack - V 4.1

Crunchy PostgreSQL Database Stack

The Crunchy PostgreSQL datastack is used to configure and deploy a PostgreSQL database into dataspaces and provide PostgreSQL data services to applications in the CloudOne environment.

Crunchy PostgreSQL is a Cloud native and Containerized relational database management system (RDBMS). Deploy enterprise-ready PostgreSQL on kubernetes infrastructure with commercial support from Crunchy Data. Crunchy Data co-authored the Security Technical Implementation Guide (STIG) for PostgreSQL with the Defense Information Systems Agency to define how PostgreSQL can be deployed and configured to meet security requirements for government systems.

Additional Features of Crunchy PostgreSQL

In addition to the capabilities and features of an Open Source PostgreSQL datastack, the Crunchy PostgreSQL datastack includes the following features:

  • The latest Enterprise features of PostgreSQL are provided, including encrypted data-at-rest, Advanced connection pooling, backup to StorageGrid bucket and advanced security features
  • Orchestration of the PostgreSQL cluster via Crunchy operator.
  • Advanced backup capabilities, including point-in-time recovery features, using PgBackrest.
  • Option to create Standby PgCluster in CloudOne AWS cluster.

Provisioning

When provisioning an Crunchy PostgreSQL datastack, the CI/CD pipeline will generate a Helm chart to provision the datastack. Behind the scenes, this Helm chart includes Kubernetes objects defined directly in OpenShift as well as a special PostgreSQL object which is a YAML structure passed to Crunchy Operator that will define to operator how to provision the new PostgreSQL cluster.

The specifications of the PostgreSQL cluster provisioned by Crunchy Operator will depend on the size the datastack set at time of onboarding, according to the following table:

Size Storage per Replica Number of Replicas Memory per pod
Small 50 GB 1 3 GB
Medium 100 GB 2 3 GB
Large 500 GB 3 12 GB
X-Large 1000 GB 3 16 GB

Pipeline Flow

This database stack is deployed as a set of one or more containers and like other types of containers deloyed in the CloudOne environment, there are certain common steps required to safely and successfully deploy the database stack. These shared requirements include:

  • Staging of the datastack container images in the appropriate Docker image repositories and tagged according to CloudOne conventions
  • Completion of Unit Testing to validate the database stack
  • Gated approvals before allowing deployments into pre-production and production environments
  • Audit trail and history of deployment within the CloudOne CI/CD Pipeline

Because of the above-listed requirements, the PostgreSQL database stack is provided in order to support the build and deployment of the PostgreSQL database in a manner that integrates with CloudOne requirements and processes. The flow of deployment includes first a Continuous Integration stage of processing in a pipeline prior to deployment in the Continuous Deployment stages. The Continuous Integration stage focuses on building the application and staging the container image or images in the appropriate Docker image repository ready for deployment. Subsequent pipeline stages deploy the database stack to the appropriate target Kubernetes spaces.

Getting Started in the Azure DevOps environment

Refer to the following link to learn about getting started in the Azure DevOps environment: Getting Started in Azure DevOps Environment

Repository Structure and Azure DevOps Variables

All of the information that instructs the CI/CD pipeline how to provision the PostgreSQL datastack and to what specifications are contained within a Git repository for the datastack and in a variable group within Azure DevOps. The name of the variable group, which will be found under the Library subsection under the Pipelines menu for the Azure DevOps project, will be crunchy-postgresql- followed by the name of the datastack, also referred to as the serviceName for the datastack. This variable group name will also have a suffix of .V3.

The Git repository will be found under the same project, with a repository name identical to the variable group name described above, but without the .V3 suffix (i.e. crunchy-postgresql- followed by the name of the datastack, or the serviceName).

Within the variable group will be a number of variables that will be referenced by the CI/CD pipelne during the provisioning process for the datastack. These variables are generated automatically as part of the original process of onboarding the datastack and are dictated by the datastack size and location. One variable of note is the V3.serviceName, which is the name of the datastack as describe above. The other variable of interest is the V3.chartDir which is derived from the serviceName and the project name (appCode). This V3.chartDir is where all of the Helm chart information is stored in the Git repository.

The structure of the Git repository for the PostgreSQL datastack will contain the aforementioned Helm chart directory. The files under this directory are automatically generated but this directory also includes a file named values.yaml in which changes can be made.

Even in the case of values.yaml, most of the contents need not be changed, however the following describes some areas in which change is potentially or likely needed:

  • postgresUsers - a YAML list of users to be provisioned. The initial list may be sufficient, however additional users can be included in this list with the same format as the initial list of users. More details about how to add a user are described below in this document.

Also at the top of the repository is a file called azure-pipelines.yml. This file contains reference to the appropriate version of the CI/CD pipeline logic and some variables unique to the datastack (e.g. PostgreSQL version, initial password settings, etc).

Continuous Integration and Continuous Delivery Pipelines

Please note that the document “CloudOne Concepts and Procedures” contains more details about the specific flow of the application through the CI/CD pipelines

The triggering of the CI/CD pipeline to provision the datastack is manual. Details for navigating to the CI/CD pipeline in Azure DevOps, triggering a run and examining the results can be found here: Continuous Integration and Continuous Delivery Pipelines

Database-Specific Pipeline Configuration

The extends YAML object is a complex object consisting of additional YAML objects. This object is used to extend the V3 pipeline logic (referenced by the repository defined in the resources object) by (a) referencing the correct datastack pipeline entry point and (b) passing a set of YAML objects as parameters to influence the behavior of the pipeline to meet an application teams specific needs.

The extends YAML object consists of 2 objects beneath it:

  • template
  • parameters

The template YAML object is a single value set to the initial entry point for the V3 pipeline for the Crunchy PostgreSQL dbstack, so it should always be defined as follows:

extends:
  template: dbaas/crunchy-postgresql-<dataspace repo name>.yml@spaces

where <dataspace repo name> is a repository under a spaces/ group containing datastack-specific settings generated at the time of onboarding the datastack.

Database Backup Specific Pipeline Configuration

The database backup specific details are specified in the variables YAML object, so it should always be defined as follows:

variables:
- { name: backupRetentionDays, value: 14 }
- { name: backupCronFull, value: '0 0 * * *' }
- { name: backupCronIncr, value: '30 */4 * * *' }

where backupRetentionDays is the variable used specify the backup retention days, backupCronFull and backupCronIncr variables are used specify the full and incremental backup schedules in the cron format.

Taking a One-Off/Manual backup

There are times where you may want to take a one-off backup, such as before major application changes or updates. Run the following command to take the manual backup.

kubectl annotate -n <dataspace_name> postgrescluster <pg_clustername> --overwrite postgres-operator.crunchydata.com/pgbackrest-backup="$( date '+%F_%H:%M:%S' )"

dataspace_name is openshift/rancher project name.

pg_clustername is name of the postgresql cluster deployed.

For example to run manual backup of "postgres1-stg" postgresql cluster running on "cdr-db-postgres1-stg" dataspace

kubectl annotate -n cdr-db-postgres1-stg postgrescluster postgres1-stg --overwrite postgres-operator.crunchydata.com/pgbackrest-backup="$( date '+%F_%H:%M:%S' )"

Click the below image to watch the video

manual backup logo

Restoring from backup

The parameters YAML object is defined immediately following the template object and at the same indentation level. This is the object in which most behavioral changes are made within the azure-pipelines.yml file (in addition to those made within the values.yaml Helm chart file).

Of note in the azure-pipelines.yml file parameters: section is the setting of the YAML object crunchy-postgresql.restore.enabled. Be default this value should be set to false for the provisioning run of the CI/CD pipeline. However, to restore the database from backup, the value of this crunchy-postgresql.restore.enabled YAML object should be set to true also set the restore target time in crunchy-postgresql.restore.recoveryTarget and run the pipeline to update the postgresql cluster manifest.

  parameters:
    crunchy-postgresql:
      restore:
        enabled: true
        recoveryTarget: '2021-11-17 00:00:26'

Run the following command to list the available backups to get the recoveryTarget detail

kubectl exec -it -n <dataspace_name> -c database \
  $(kubectl get pods -n <dataspace_name> --selector='postgres-operator.crunchydata.com/role=master' -o name) -- \
  pgbackrest info

For example to list the backup details from the following "cdr-db-postgres1-stg" dataspace

kubectl exec -it -n cdr-db-postgres1-stg -c database \
  $(kubectl get pods -n cdr-db-postgres1-stg --selector='postgres-operator.crunchydata.com/role=master' -o name) -- \
  pgbackrest info  


stanza: db
    status: ok
    cipher: none

    db (current)
        wal archive min/max (14): 00000002000000010000003E/000000020000000200000089

        full backup: 20211117-000026F
            timestamp start/stop: 2021-11-17 00:00:26 / 2021-11-17 00:01:32
            wal start/stop: 00000002000000010000003E / 00000002000000010000003E
            database size: 33.8MB, database backup size: 33.8MB
            repo1: backup set size: 4.3MB, backup size: 4.3MB

        incr backup: 20211117-000026F_20211117-003017I
            timestamp start/stop: 2021-11-17 00:30:17 / 2021-11-17 00:30:22
            wal start/stop: 000000020000000100000040 / 000000020000000100000040
            database size: 33.8MB, database backup size: 8.3KB
            repo1: backup set size: 4.3MB, backup size: 436B
            backup reference list: 20211117-000026F

Once the pipeline run completes successfully, proceed with the following command to initiate the restore process.

kubectl annotate -n <dataspace_name> postgrescluster <pg_clustername> --overwrite postgres-operator.crunchydata.com/pgbackrest-restore=id1

dataspace_name is openshift/rancher project name.

pg_clustername is name of the postgresql cluster deployed.

Make sure to disable the restore option once you are done with the database restore acivity by updating the manifest and running the pipeline.

  parameters:
    crunchy-postgresql:
      restore:
        enabled: false
        recoveryTarget: '2021-11-17 00:00:26'

Click the below image to watch the video

restore backup logo

Cloning (Restore backup of another database)

To restore database using another database backup, you need to remove the target postgrescluster first. Use the following command to delete the postgrecluster.

kubectl get PostgresClusters  -n <dataspace name> --no-headers=true | awk '{print $1}' | xargs  kubectl -n <dataspace name> delete PostgresClusters

for example to remove the postgresql cluster from cdr-db-postgres1-stg dataspace.

kubectl get PostgresClusters  -n cdr-db-postgres1-stg --no-headers=true | awk '{print $1}' | xargs  kubectl -n cdr-db-postgres1-stg delete PostgresClusters

The parameters YAML object is defined immediately following the template object and at the same indentation level. This is the object in which most behavioral changes are made within the azure-pipelines.yml file (in addition to those made within the values.yaml Helm chart file).

Of note in the azure-pipelines.yml file parameters: section is the setting of the YAML object crunchy-postgresql.clone.enabled. Be default this value should be set to false for the provisioning run of the CI/CD pipeline. However, to restore the database using another database backup, the value of this crunchy-postgresql.clone.enabled YAML object should be set to true also set the correct values for crunchy-postgresql.clone.recoveryTarget,crunchy-postgresql.clone.sourcePgCluster,crunchy-postgresql.clone.sourceDataspace and run the pipeline to initiate the cloning process.

  parameters:
    crunchy-postgresql:
      clone:
        enabled: true
        recoveryTarget: '2021-11-17 00:00:26'
        sourcePgCluster: postgres1-stg
        sourceDataspace: cdr-db-postgres1-stg

Run the following command to list the available backups to get the recoveryTarget detail. Please make sure to point to source dataspace name to run the command.

kubectl exec -it -n <dataspace_name> -c database \
  $(kubectl get pods -n <dataspace_name> --selector='postgres-operator.crunchydata.com/role=master' -o name) -- \
  pgbackrest info

For example to list the backup details from the following "cdr-db-postgres1-stg" dataspace

kubectl exec -it -n cdr-db-postgres1-stg -c database \
  $(kubectl get pods -n cdr-db-postgres1-stg --selector='postgres-operator.crunchydata.com/role=master' -o name) -- \
  pgbackrest info  

stanza: db
    status: ok
    cipher: none

    db (current)
        wal archive min/max (14): 00000002000000010000003E/000000020000000200000089

        full backup: 20211117-000026F
            timestamp start/stop: 2021-11-17 00:00:26 / 2021-11-17 00:01:32
            wal start/stop: 00000002000000010000003E / 00000002000000010000003E
            database size: 33.8MB, database backup size: 33.8MB
            repo1: backup set size: 4.3MB, backup size: 4.3MB

        incr backup: 20211117-000026F_20211117-003017I
            timestamp start/stop: 2021-11-17 00:30:17 / 2021-11-17 00:30:22
            wal start/stop: 000000020000000100000040 / 000000020000000100000040
            database size: 33.8MB, database backup size: 8.3KB
            repo1: backup set size: 4.3MB, backup size: 436B
            backup reference list: 20211117-000026F

Run the following command to sync up the restic repo after the restore completes so that the backup works for the cloned database.

kubectl exec -it -n <dataspace_name> -c database \
  $(kubectl get pods -n <dataspace_name> --selector='postgres-operator.crunchydata.com/role=master' -o name) -- bash -c \
  "pgbackrest --stanza=db stanza-upgrade"

Additional Operations

Manually Restarting PostgreSQL

There are times when you might need to manually restart PostgreSQL. This can be done by adding or updating a custom annotation to the cluster’s spec.metadata.annotations section. PGO will notice the change and perform a rolling restart

kubectl patch postgrescluster/<pg_clustername> -n <dataspace_name> --type merge \
  --patch '{"spec":{"metadata":{"annotations":{"restarted":"'"$(date)"'"}}}}'

For example to restart "postgres1-stg" postgresql cluster running on "cdr-db-postgres1-stg" dataspace

kubectl patch postgrescluster/postgres1-stg -n cdr-db-postgres1-stg --type merge \
  --patch '{"spec":{"metadata":{"annotations":{"restarted":"'"$(date)"'"}}}}'  

Shutdown PostgreSQL

You can shut down a Postgres cluster by setting the spec.shutdown attribute to true. You can do this by editing the posgrescluster manifest.

kubectl patch postgrescluster/<pg_clustername> -n <dataspace_name> --type merge \
  --patch '{"spec":{"shutdown": true}}'

For example to shutdown the "postgres1-stg" postgresql cluster running on "cdr-db-postgres1-stg" dataspace

kubectl patch postgrescluster/postgres1-stg -n cdr-db-postgres1-stg --type merge \
  --patch '{"spec":{"shutdown": true}}'

How to Add a New PostgreSQL User

By default, the Crunchy PostgreSQL datastack is provisioned with two users (appdba and appuser). The passwords for these users are auto generated by operator and stored as secret object in kubernetes.

In order to add a new database user for an application, follow these steps:

Step 1: Update the values.yaml file under postgresUsers: section

For example to create user1 in sampledb database with readwrite privilege.

postgresUsers:
  users:
    - name: user1
      databases:
        - demodb
      options: "CREATEDB CREATEROLE"

Step 2: Run the pipeline to deploy the new database user added in values.yaml file.

How to Expose an Crunchy PostgreSQL Datastack for Access from Outside of OpenShift/Kubernetes

Starting in version 3.5 of the CloudOne pipeline, an Ambassador edge proxy mapping is available to manage inbound connections to PostgreSQL . This is enabled by defining new Helm chart values.

The DNS subdomain in which the Ambassador-provided URL for the PostgreSQL cluster will be exposed will depend on the location of the datastack, i.e. under which CloudOne namespace. The following table lists the different types of spaces and the subdomains under which the Ambassaor mappings will be created:

Space Name/Type Ambassador URL Subdomain
workspace .prd01d.cloudone.netapp.com
devint .prd05d.cloudone.netapp.com
stg-1 .prd06d.cloudone.netapp.com
prd-1 .prd06d.cloudone.netapp.com

In order to enable Ambassador mappings, set the YAML variable ambassador.dbaas.enabled to true in the Helm chart values files (values.yaml or an alternative one if used).

ambassador:
  tls:
    secret:
  dbaas:
    enabled: true
    domain: 

*NOTE: The values of ambassador.dbaas.domain and ambassador.tls.secret are left empty in the Helm chart as these are supplied by variables defined in variable groups in Azure DevOps based on the workspace or dataspace to which the database is deployed by the CI/CD pipeline.*

The generated Ambassador hostname format will be along these lines (depending on the type of space):

  • <service name>-<pod-cardinal-number>.prd01d.cloudone.netapp.com for AWS
  • <service name>-<pod-cardinal-number>.prd02d.cloudone.netapp.com for On-Prem HCI
  • <service name>-<pod-cardinal-number>.prd05d.cloudone.netapp.com for Rancher AWS
  • <service name>-<pod-cardinal-number>.prd06d.cloudone.netapp.com for Rancher On-Prem

For example the pod-cardinal-number values for a 3 member replicaset will be the numbers 0,1 and 2.

Troubleshooting

If something fails to deploy, the information about why the deployment failed (or was not even initiated) will be found in the logs of the CI/CD pipeline and can be tracked down using the instructions at the link listed in the Continuous Integration and Continuous Delivery Pipelines section.

However, additional information may be required either to better troubleshoot a failed deployment or to investigate the runtime behavior of the Crunchy PostgreSQL database stack that has been successfully deployed. In those cases, much of the information can be found in the Ops Manager UI and additional informat may also be found in the OpenShift web console. Information about navigating and analyzing information from the OpenShift web console can be found here: Navigating the OpenShift Web Console