Development Snapshot Download Service

A download service for developers to obtain the latest application data snapshot.
Jun 6, 2020 · 687 words · 4 minute read

This article discusses a simple Golang microservice I threw together to make it easier for developers to obtain a copy of the most recent database snapshot from a backup storage service.

Suppose the database or state data for your production application is being backed up in full on a regular basis, encrypted and stored on an object storage service, such as S3. Suppose the backup tool you are using keeps track of successful backups in some kind of lookup service, such as an ElasticSearch index, database, etcd cluster or whatever. Suppose one of your developers needs a recent database snapshot to seed their local development environment to develop a feature or reproduce a bug.

The problem I am attempting to address is to make it easier for developers to have access to the latest data snapshot for local development purposes, and minimise the ongoing sysadmin overheads involved in ensuring each developer has access.

Developer Sysadmin
Needs to download and configure a tool to access the backups bucket on S3. Has to provide each developer with credentials to access the backups bucket on S3, and assist them with any issues they may have authenticating or downloading snapshots with their specific tools. May also need to consider access logging/auditing requirements.
Needs access to the backup encryption passphrase stored in Vault Needs to ensure Vault is configured to allow the developer access to the backups secret, and support any issues the developer has accessing it.
Needs to manually identify the latest backup available on the S3 each time they need fresh data. N/A
Download the backup, decrypt the backup and feed it into their local database each time they need fresh data. N/A

The long-winded way…

The solution I’ve developed is a simple download service that runs server side and presents an HTTP endpoint that developers can use to avoid having to do all that downloading and decrypting stuff manually each time, or each developer having to script those steps locally for their own environments.

The GET requests handler for the /download URI performs the following steps:

  • Uses it’s ElasticSearch configuration variables to connect to the backups-* logging index and identify the storage URI for the most recent successful backup.
  • Uses it’s ‘approle’ credentials to connect to Vault to obtain the backups passphrase secret that was used to encrypt the backup.
  • Uses it’s own credentials to connect to S3 storage and retrieve a stream handle on which it can read the encrypted data directly.
  • Fires up a GnuPG process and feeds the passphrase into an extra file descriptor.
  • Runs a goroutine to feed the encrypted data from the S3 bucket to the standard input of the GnuPG process.
  • Runs another goroutine to feed the decrypted output of the GnuPG process to the writer of the HTTP response.

The sysadmin now just has to deploy that to an endpoint that has some security and authentication configured, and add it to the monitoring system.

Developers now simply need to download the latest snapshot using their usual (LDAP, SSO or whatever) credentials using the given endpoint provided by the sysadmin. They can do this using their browser, or they can do something like the following from the command line or in their local provisioning scripts:

curl -u fred:fr3dsPa55w0rd https://snapshot.your.application.com/download | mysql app

For now, it simply downloads the latest snapshot and is all very opinionated, but then I only just got it basically working earlier today. Some things I’d like to add include:

  • #1 - A README.md file to document it’s use and configuration a bit more formally. I simply run out of time today.
  • #2 - The ability for developer to list the older backups from the ‘retained’ set, and select a specific one to download.
  • #3 - The ability to download specific backups from the set of latest backups. For some larger applications we have the full backup, plus ‘reduced’ backups containing smaller subsets of the full data that have had sensitive details removed or obfuscated to make them more appropriate for development and testing purposes.

However, I’ve scratched my itch for now, so that’s enough on this for this weekend.