Part 1: Using DBSnapper with GitHub Actions and Amazon ECS
Overview¶
Note: This is Part 1 of a multi-part series on using DBSnapper with GitHub Actions and Amazon ECS. Go here to read Part 2 - A Simplified Approach with Third-Party Runners which includes the results of the workflow execution.
Update: New DBSnapper GitHub Action
We've updated this article to use our new Install DBSnapper Agent GitHub Action that makes it simple to install the DBSnapper Agent onto a GitHub Actions runner. These changes can be found in Step 4.
The motivation behind this article is to describe an automated way to snapshot and sanitize database workloads running on private infrastructure. In this article, we'll use GitHub Actions and self-hosted runners to trigger a snapshot and sanitization of an Amazon RDS database using DBSnapper and Amazon ECS Fargate.
Services and components used¶
- DBSnapper Agent to take the snapshot, sanitize it, and store it.
- DBSnapper Cloud providing target configuration and snapshot storage.
- GitHub Actions (GHA) as CI/CD provider to automate the environment setup and trigger the snapshot.
- GitHub Actions self-hosted runners to run the DBSnapper Agent in our private infrastructure.
- Amazon ECS Fargate to run an ephemeral GHA self-hosted runner
Other requirements¶
- We use GitHub's OIDC Provider to provide AWS credentials needed for the actions. We used the instructions given in the aws-actions/configure-aws-credentials OIDC Section to setup the federation between GitHub and AWS, using the CloudFormation Template to speed things up.
- We use the GitHub official
actions-runner
container for our self-hosted runner.
GitHub Secrets:¶
We will need the following GitHub Secrets configured for the GitHub Action:
FG_PAT
- Fine-Grained GitHub Personal Access Token (PAT) withrepo:actions:(read/write)
andorg:self-hosted-runners:(read/write)
scopes.DBSNAPPER_AUTHTOKEN
- Your DBSnapper Cloud Auth Token used to authenticate to the DBSnapper API.DBSNAPPER_SECRET_KEY
- Your DBSnapper config secret key used to encrypt certain configuration values.
Overview of the steps¶
- Setup the GitHub Actions workflow basics.
- Get a Registration token to register a self-hosted runner.
- Launch an Amazon ECS Fargate Task to run the self-hosted runner.
- Execute the DBSnapper Agent snapshot and sanitization commands.
- Cleanup the Amazon ECS Fargate Tasks.
Step 1 - Setup the GitHub Actions Workflow¶
Here we setup the basic elements of the GitHub Actions workflow. We define the name of the workflow, the event that triggers it, and some environment variables that we will use throughout the workflow.
IAM Policy for the AWS_IAM_ROLE¶
The permissions
section is needed for the GitHub OIDC provider to provide AWS credentials to the actions and the AWS_IAM_ROLE
is the ARN of the role that the OIDC provider will assume to provide the AWS credentials. This role should have the necessary permissions to interact with the services we are using, specifically ECS in our case:
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "RegisterTaskDefinition",
"Effect": "Allow",
"Action": ["ecs:RegisterTaskDefinition"],
"Resource": "*"
},
{
"Sid": "PassRolesInTaskDefinition",
"Effect": "Allow",
"Action": ["iam:PassRole"],
"Resource": "*"
},
{
"Sid": "DeployService",
"Effect": "Allow",
"Action": ["ecs:UpdateService", "ecs:DescribeServices"],
"Resource": "*"
}
]
}
Above is an example of the IAM policy that the AWS_IAM_ROLE
should have to interact with ECS. You should set specific Resource ARNs for the ecs:UpdateService
and ecs:DescribeServices
actions to limit the access to the ECS cluster and service you are using. I've wildcarded them here for simplicity.
Step 2 - Get Registration Token to Register a Runner¶
Inputs¶
secrets.FG_PAT
- The Fine-Grained GitHub Personal Access Token (PAT) used in theAuthentication
header for the API callenv.ORGANIZATION
- The GitHub Organization that the runner will be registered with, used in the API URL.
Outputs¶
create_token.registration_token
- The registration token that will be used to register the self-hosted runner
Description¶
Step two gives us a Registration Token that we can pass to the self-hosted runner configuration script to register the runner with the GitHub Actions service. We use the GitHub REST API to get the registration token and output it to a file that we can access in the next step.
We use the jq
utility to parse the JSON response from the API and extract the token
field from the API response. We use the -r
flag to output the raw version of the token
field, otherwise it would be quoted and cause issues later in the workflow.
in line 19 we save the output of the token
to the $GITHUB_OUTPUT variable which is the new way to set outputs in GitHub Actions. This is assigned to the registration_token
output of the job in line 5 for use later in the workflow.
Step 3 - Launch Amazon ECS Task¶
Inputs¶
env.AWS_REGION
- The AWS region where the ECS cluster is locatedenv.AWS_IAM_ROLE
- The ARN of the IAM role that the OIDC provider will assume to provide the AWS credentials.env.ECS_CLUSTER
- The name of the ECS cluster where the task will be run.env.ECS_SERVICE
- The name of the ECS service that will run the task.env.ECS_TASK_DEFINITION
- The path and filename of the task definition file in the repository..aws/ecs_task_definition_github_runner.json
- The initial task definition file in the repository for the ECS Task.
Actions¶
actions/checkout@v4
- This action checks out the repository so that we can access the ECS task definition file.aws-actions/configure-aws-credentials@v4
- This action configures the AWS credentials for the actions to use. It uses the OIDC connector to authenticate with AWS.restackio/update-json-file-action@main
- This action updates the ECS task definition JSON file with the container command that provides the--url
of the GitHub organization and the--token
registration token to register the self-hosted runner. The main branch includes an updated version of node used in the action. We use this instead ofamazon-ecs-render-task-definition
because it improperly parses the provided command string, causing issues with the runner registration per this issue comment.
Description¶
Checkout and Configure AWS Credentials¶
In this step, we're defining the runner-ecs
job (that depends on successful execution of the register-runner
job) that will launch an Amazon ECS Fargate Task to run the self-hosted runner. The first two steps are straightforward. We use the actions/checkout@v4
action on line 6 to checkout the repository so that we can access the ECS task definition file. We then use the aws-actions/configure-aws-credentials@v4
action on line 10 to configure the AWS credentials for the actions to use. This action uses the OIDC connector to authenticate with AWS.
Update ECS Task Definition JSON¶
The next step starting on line 15, we update the ECS task definition file in our repository to reflect the command we'll run to configure the self-hosted runner. In this command we use the config.sh
script to configure the runner with the --url
and --token
arguments, which are needed to register the runner in the GitHub Actions service. We can find our GHA runners in the organization actions settings URL: https://github.com/organizations/<your_organization>/settings/actions/runners
GHA Runner Docs
It wasn't immediately obvious how to run the GHA runner container, so there was a bit of trial and error locally to get the commands right. Documentation can be found in the repository, and the Automate Configuring Self-Hosted Runners is especially useful. Two scripts in the runner container are used to configure and run the runner:
config.sh
- This script configures the runner with the provided--url
and--token
arguments. The--ephemeral
flag is used to run the runner in ephemeral mode, meaning it will be removed when the task is stopped, and the--unattended
flag is used to run the runner in unattended mode, meaning it won't prompt for user input.run.sh
- This script starts the runner, registering it with GitHub Actions, starting the runner service and waiting for jobs to run. This script is run after the runner is configured with theconfig.sh
script.
Both scripts will display help and available command line options by passing --help
as an argument. The config.sh
script is the only one that requires arguments, and the run.sh
script will run the runner with the configuration provided by the config.sh
script.
The --ephemeral
flag is used to run the runner in ephemeral mode, meaning the runner will be removed from the organization after it has completed a single job. This ensures we don't have any lingering runners in the organization.
Deploy Amazon ECS Task Definition¶
The next step on line 29 uses the aws-actions/amazon-ecs-deploy-task-definition@v1
action to deploy the updated task definition to the ECS cluster. This action takes the path to the task definition file, the cluster, and service name as inputs. We set the wait-for-service-stability
input to false
to avoid waiting for the service to stabilize before continuing with the workflow, since Step 4 - Execute the DBSnapper Agent Commands won't proceed until the runner is registered, running, and able to execute the job.
ECS Service: Set Desired Tasks to 1¶
Finally, on line 39, we use the aws ecs update-service
command (from the aws-cli) to set the desired task count for the service to 1. This will start a task in the service to run the self-hosted runner. The updated task definition will not start without this command. We use a similar command in Step 5 - Cleanup the Amazon ECS Task to set the desired task count to 0 to stop the runner after the job is complete.
Task Definition File¶
To launch an ECS task via github actions and the aws-actions/amazon-ecs-deploy-task-definition@v1
action, we need to provide a task definition file. This file is a JSON file that defines the properties of the task that we want to run. We can create this file using the ECS task definition wizard in the AWS console and then copy the task definition JSON to a file in our repository, or we can just use the sample proided below modified for your environment. In our workflow example, we're using the ECS_TASK_DEFINITION
environment variable to specify the path to the task definition file.
In this task definition, we set the default GHA runner image on line 5. If needed, we can update this image in the workflow using the restackio/update-json-file-action@main
action. We also set the cpu
and memory
values for this task on lines 31 and 32 to the minimum values needed to run the runner and any additional services.
Step 4 - Execute the DBSnapper Agent Commands¶
Inputs¶
secrets.DBSNAPPER_SECRET_KEY
- Your DBSnapper config secret key used to encrypt certain configuration values.secrets.DBSNAPPER_AUTHTOKEN
- Your DBSnapper Cloud Auth Token used to authenticate to the DBSnapper API.DBSnapper Debian Package
- The latest release of the DBSnapper Agent. Download the.deb
package that matches your architecture. We use thedbsnapper_linux_x86_64.deb
package in this example.
Description¶
The steps prior to this one were necessary to setup a self-hosted runner in our private infrastructure with access to the workloads in our network. In this step, we're finally able to run the DBSnapper Agent commands to snapshot and sanitize the database.
Runs-on: Self-Hosted¶
On line 2, we specify the runs-on
property as self-hosted
to run the job on the self-hosted runner we registered in the previous step. This property can take additional labels to be more specific about the runner that should run the job, in cases where you have many different types of runners registered in your organization. In line 3 we specify that this job depends on the runner-ecs
job to ensure the ecs jobs are run before running the dbsnapper job.
Environment Variables¶
We set our environment variables starting on line 4. In this example, we set the minimum required DBSNAPPER_SECRET_KEY
and DBSNAPPER_AUTHTOKEN
environment variables necessary to run DBSnapper without a configuration file.
DBSnapper Configuration via Environment Variables
DBSnapper can be configured exclusively through environment variables if you don't want to rely on a configuration file. All the configuration options can be represented as environment variables through a specific naming convention involving prefixing the environment variable with DBSNAPPER
and replacing periods with two underscores __
. Some examples from the DBSnapper Configuration Documentation include:
docker.images.postgres
->DBSNAPPER_DOCKER__IMAGES__POSTGRES: postgres:latest
# Sets the docker image to use for the postgres containersdefaults.shared_target_dst_db_url
->DBSNAPPER_DEFAULTS__SHARED_TARGET_DST_DB_URL: <connstring>
# Sets the default destination database URL for shared targetsoverride.san_query
->DBSNAPPER_OVERRIDE__SAN_QUERY: <base-64-encoded-value>
# Sets a query to use for sanitization overriding any existing queries.
Install and Run DBSnapper¶
Starting on line 8 we use the new DBSnapper GitHub Action to install the latest version of the DBSnapper Agent. This action takes into account the operating system and architecture of the runner and installs the appropriate version of the DBSnapper Agent.
Starting on line 8, we run commands to update our apt repository, install curl
, and use curl
to download the latest release of the DBSnapper Agent. We then use dpkg
to install the .deb
package.
Why Not use the DBSnapper Docker Image?
At this point, it would have been convenient to use the DBSnapper Docker image to run the DBSnapper commands. Unfortunately GitHub Actions... runner does not have Docker installed by default, so we would need to install Docker in the runner before we could use the Docker image. To avoid the additional complexity we decided to download and install the .deb
package instead.
...doesn't support using docker from a self-hosted runner at this time. See the following issues for more information:
- https://github.com/actions/runner/issues/406
- https://github.com/actions/runner/issues/367
Database Utilities Needed
When using the DBSnapper container image, the Agent and database utilities are already included in the image. Since GitHub Actions doesn't support Docker containers, we need to install the tools by hand. In this case, we install the PostgreSQL client on line 17 to support the snapshot of our Postgresql RDS database. If you are using a different database, you will need to install the appropriate client.
On line 13, we run the dbsnapper build dvdrental-prod
command which will use the DBSNAPPER_AUTHTOKEN
to authenticate to the DBSnapper Cloud, create a snapshot of the database specified in the dvdrental-prod
target, and store it in the cloud storage specified in the target configuration. Once this is complete and no additional steps are needed, the task will be cleaned up in the next step.
Step 5 - Cleanup the Amazon ECS Task¶
Inputs¶
env.AWS_REGION
- The AWS region where the ECS cluster is locatedenv.AWS_IAM_ROLE
- The ARN of the IAM role that the OIDC provider will assume to provide the AWS credentials.env.ECS_CLUSTER
- The name of the ECS cluster where the task will be run.env.ECS_SERVICE
- The name of the ECS service that will run the task.
Actions¶
aws-actions/configure-aws-credentials@v4
- This action configures the AWS credentials for the actions to use. It uses the OIDC connector to authenticate with AWS.
Description¶
In this final step, we need to use the aws-actions/configure-aws-credentials@v4
action to configure the AWS credentials so we can use the AWS CLI to run the aws ecs update-service
command. We use this command to set the desired task count for the service to 0, which will stop all ECS runner tasks, ensuring we don't get billed for unnecessary resources. Because we configured the runners with the --ephemeral
flag, they will be automatically removed from the GitHub organization when they are terminated.
Note: By using the if: ${{ always() }}
condition on line 3, we ensure this step runs even if the previous steps fail. This is important to ensure that the ECS service is cleaned up even other steps fail, which under normal circumstances would not be the case.
Conclusion¶
In this article, we've shown how to use GitHub Actions to trigger a snapshot and sanitization of an Amazon RDS database using DBSnapper and Amazon ECS Fargate. We've covered the setup of the GitHub Actions workflow, the registration of a self-hosted runner, the launch of an Amazon ECS Fargate task to run the runner, the execution of the DBSnapper Agent commands, and the cleanup of the Amazon ECS task. This workflow can be used to automate the snapshot and sanitization of database workloads running on private infrastructure, providing an automated and reliable manage database backups.
Full Workflow¶
Here is the full GitHub Actions workflow that we've described in this article. Be sure to replace any placeholders with your own values.
DBSnapper Build Snapshot - Full Workflow | |
---|---|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 |
|