Bastions on Demand

6 May 2020

Any time you have a VPC, you’ll likely need some way to gain access to the resources within the VPC from your local box. Typically, the way to do that is to run a bastion (or jumpbox) which you and your team can SSH into. The downside is that you are exposing an entry point into your network that is accessible by multiple people and running 24x7. And depending on how you manage permissions, you may not be able to restrict access to the box via IAM. Obviously, this is not ideal.

Luckily, we have Fargate.

With Fargate, we no longer need to maintain permanent bastion instances—we can create bastions when needed and tear them down when no longer in use. We can lock down bastion instances to an individual user both in terms of SSH keys and IP address. And we can restrict access via IAM to both the API used to manage bastions and to which SSH keys are used to log into an instance.

All in all, we save on infrastructure spend while reducing our attack surface.

Throughout this guide, I’ll be referencing the code from my bastions-on-demand repo. If you want to skip the explanation, just clone it and follow the directions in the README.md to get started.

If you run into any trouble, create an issue, and I’ll respond as best I can.

Otherwise, buckle up. This is going to be a bit in-depth.

Architecture

There are two key components to bastions on demand—the infrastructure for managing the bastion image and the bastion service itself.

The container infrastructure consists of an ECR repository, a Docker container, an IAM role for fetching a user’s public keys, and scripts for building and pushing images. Typically, I share this infrastructure across multiple services because the requirements don’t vary much. But if a team wants complete service isolation or needs to customize their bastion image, it’s trivial to make that work.

The bastion service consists of an ECS task, a task role that enables access to any required resources, an API to create and destroy bastion instances, and a set of scripts to make it easy for team members to do just that. The bastion service module should be included in any service that needs bastions—keep it in the service’s repository for ease of access and deploy it alongside the parent service.

Naturally, all of this infrastructure is managed with Terraform.

I owe a debt of gratitude to the following authors as they provided valuable examples that helped me develop this approach:

While I recommend using multiple AWS accounts for security and isolation, in this guide, I’m going to use a single account so that we can focus on the essentials. If there’s interest, I will address how to modify this approach for use with multiple accounts in a separate guide. Everything in this guide assumes you’re using the default profile, but you can override via the AWS_PROFILE environment variable.

I’m also deliberately not using Terraform remote state in this guide to make it easier for you to try out my code. If you’re going to use this in a live account, you absolutely should use remote state—insert your own backend configuration where appropriate. And if you don’t have a remote backend, have a look at my remote-state repo for an example of how to set one up on S3 with DynamoDB and KMS.

Now, let’s get started.

Initial Setup

If you haven’t already, create a role for API Gateway logging:

data "aws_iam_policy_document" "assume_role" {
  statement {
    actions = ["sts:AssumeRole"]

    principals {
      identifiers = ["apigateway.amazonaws.com"]
      type        = "Service"
    }
  }
}

data "aws_iam_policy_document" "logger" {
  statement {
    actions = [
      "logs:CreateLogGroup",
      "logs:CreateLogStream",
      "logs:DescribeLogGroups",
      "logs:DescribeLogStreams",
      "logs:FilterLogEvents",
      "logs:GetLogEvents",
      "logs:PutLogEvents",
    ]

    resources = ["*"]
  }
}

resource "aws_iam_role" "logger" {
  name               = "api-gateway-cloudwatch-logger"
  assume_role_policy = data.aws_iam_policy_document.assume_role.json
}

resource "aws_iam_role_policy" "logger" {
  name   = "api-gateway-cloudwatch-logger"
  policy = data.aws_iam_policy_document.logger.json
  role   = aws_iam_role.logger.name
}

resource "aws_api_gateway_account" "global" {
  cloudwatch_role_arn = aws_iam_role.logger.arn
}

Logging API Gateway access is a good idea in general. Unfortunately, this is a global account setting, so use with caution. API Gateway has a lot of stateful corners. I typically manage this logger in a separate repository since it’s shared across all services running in an account.

Container Infrastructure

Creating the ECR repository is straightforward:

resource "aws_ecr_repository" "bastion" {
  name = "bastion"
}

Next we need to create a role that can fetch a user’s public keys.

data "aws_caller_identity" "env" {}

# …

data "aws_iam_policy_document" "assume_role" {
  statement {
    actions = ["sts:AssumeRole"]

    principals {
      identifiers = ["arn:aws:iam::${data.aws_caller_identity.env.account_id}:root"]
      type        = "AWS"
    }
  }
}

data "aws_iam_policy_document" "public_key_fetcher" {
  statement {
    actions = [
      "iam:GetSSHPublicKey",
      "iam:ListSSHPublicKeys",
    ]

    resources = ["arn:aws:iam::${data.aws_caller_identity.env.account_id}:user/*"]
  }
}

resource "aws_iam_role" "public_key_fetcher" {
  name               = "public-key-fetcher"
  assume_role_policy = data.aws_iam_policy_document.assume_role.json
}

resource "aws_iam_role_policy" "public_key_fetcher" {
  name   = "public-key-fetcher"
  policy = data.aws_iam_policy_document.public_key_fetcher.json
  role   = aws_iam_role.public_key_fetcher.id
}

This role will be used by the bastion instance to fetch a user’s keys when the user attempts to SSH into the instance, as we’ll see below.

Dockerfile

Creating the container image is fairly straightforward. We start with Alpine Linux because it’s small and security-oriented and add the scripts we’ll need to start sshd and handle login.

FROM alpine:3.11

WORKDIR /root

ADD fetch_authorized_keys.sh /usr/local/bin/fetch_authorized_keys.sh
ADD entrypoint.sh /usr/local/bin/entrypoint.sh

Next, we install dependencies including the AWS CLI. If you typically need any other packages for your bastion, add ‘em to the list. (It still bugs me that AWS doesn’t provide checksums for its CLI bundle. Oh well.)

RUN echo "Installing dependencies..." && \
  apk --no-cache \
    add \
      bash \
      curl \
      openssh \
      python \
      tini \
  && \
  echo "Installing AWS CLI..." && \
  wget https://s3.amazonaws.com/aws-cli/awscli-bundle.zip && \
  unzip awscli-bundle.zip && \
  rm awscli-bundle.zip && \
  ./awscli-bundle/install -i /usr/local/aws -b /usr/local/bin/aws && \
  rm -R awscli-bundle && \
  /usr/local/bin/aws --version

Now we need a user to log in as.

You could use root. I have in the past with small teams that needed the additional flexibility. But if you’re providing an image to teams that are independent of whoever handles security, you don’t necessarily want people making changes to the bastion that could open up holes in your network.

Instead, we create a user named ops and unlock the account for login.

RUN echo "Creating user \"ops\"..." && \
  adduser ops --disabled-password

RUN echo "Unlocking \"ops\"..." && \
  sed -i "s/ops:!:/ops:*:/g" /etc/shadow

With that out of the way, we now configure sshd.

RUN echo "Configuring sshd..." && \
  sed -i "s:#AuthorizedKeysCommand none:AuthorizedKeysCommand /usr/local/bin/fetch_authorized_keys.sh:g" /etc/ssh/sshd_config && \
  sed -i "s:#AuthorizedKeysCommandUser nobody:AuthorizedKeysCommandUser nobody:g" /etc/ssh/sshd_config && \
  sed -i "s:#GatewayPorts no:GatewayPorts yes:g" /etc/ssh/sshd_config && \
  sed -i "s:#PasswordAuthentication yes:PasswordAuthentication no:g" /etc/ssh/sshd_config && \
  sed -i "s:#PermitTunnel no:PermitTunnel yes:g" /etc/ssh/sshd_config && \
  sed -i "s:AllowTcpForwarding no:AllowTcpForwarding yes:g" /etc/ssh/sshd_config && \
  sed -i "s:AuthorizedKeysFile .ssh/authorized_keys:AuthorizedKeysFile none:g" /etc/ssh/sshd_config

We’re configuring sshd to do a few different things.

We’re setting the AuthorizedKeysCommand to use fetch_authorized_keys.sh, and we’re disabling both password logins and the ability to use an AuthorizedKeysFile on the instance. The intention here is to make it possible to only use the logic in fetch_authorized_keys.sh to authenticate the user.

We’re also setting sshd up for proxying.

Finally, we configure the ENTRYPOINT:

ENTRYPOINT [ "/sbin/tini", "--" ]
CMD [ "/bin/sh", "/usr/local/bin/entrypoint.sh" ]

The only interesting thing here is we’re using tini. If you’re interested in why, check out this GitHub issue.

entrypoint.sh

On container startup, the first thing we need to do is generate host keys.

ssh-keygen -A

Then we export the global environment which includes the AWS credentials injected into the container, the role for fetching SSH keys, and the user name of the person for whom this bastion instance is intended.

echo "export AWS_CONTAINER_CREDENTIALS_RELATIVE_URI=$AWS_CONTAINER_CREDENTIALS_RELATIVE_URI" > /etc/profile.d/authorized_keys_configuration.sh
echo "export AWS_DEFAULT_REGION=$AWS_DEFAULT_REGION" >> /etc/profile.d/authorized_keys_configuration.sh
echo "export AWS_EXECUTION_ENV=$AWS_EXECUTION_ENV" >> /etc/profile.d/authorized_keys_configuration.sh
echo "export AWS_REGION=$AWS_REGION" >> /etc/profile.d/authorized_keys_configuration.sh
echo "export ECS_CONTAINER_METADATA_URI=$ECS_CONTAINER_METADATA_URI" >> /etc/profile.d/authorized_keys_configuration.sh
echo "export ASSUME_ROLE_FOR_AUTHORIZED_KEYS=$ASSUME_ROLE_FOR_AUTHORIZED_KEYS" >> /etc/profile.d/authorized_keys_configuration.sh
echo "export USER_NAME=$USER_NAME" >> /etc/profile.d/authorized_keys_configuration.sh

Because fetch_authorized_keys.sh is running as nobody and there isn’t a way to inject the necessary variables into the environment at runtime, we export them to /etc/profile.d/authorized_keys_configuration.sh so that they’re available at a known location when someone tries to log into the instance.

Finally, we run sshd.

exec /usr/sbin/sshd -D -e "$@"

fetch_authorized_keys.sh

This is where the magic happens. The AuthorizedKeysCommand setting for sshd enables you to do pretty much anything as far as fetching keys. In our case, we’re going to be fetching them from IAM.

First off, we source the container’s environment (created via entrypoint.sh) and assume the public key fetcher role we created above.

source /etc/profile.d/authorized_keys_configuration.sh

sts_credentials=$(/usr/local/bin/aws sts assume-role \
  --role-arn "${ASSUME_ROLE_FOR_AUTHORIZED_KEYS}" \
  --role-session-name fetch-authorized-keys-for-bastion \
  --query '[Credentials.SessionToken,Credentials.AccessKeyId,Credentials.SecretAccessKey]' \
  --output text)

Then we inject the assumed role’s credentials into the environment and print all of the bastion user’s active public keys to STDOUT.

AWS_ACCESS_KEY_ID=$(echo "${sts_credentials}" | awk '{print $2}')
AWS_SECRET_ACCESS_KEY=$(echo "${sts_credentials}" | awk '{print $3}')
AWS_SESSION_TOKEN=$(echo "${sts_credentials}" | awk '{print $1}')
AWS_SECURITY_TOKEN=$(echo "${sts_credentials}" | awk '{print $1}')
export AWS_ACCESS_KEY_ID AWS_SECRET_ACCESS_KEY AWS_SESSION_TOKEN AWS_SECURITY_TOKEN

/usr/local/bin/aws iam list-ssh-public-keys --user-name "$USER_NAME" --query "SSHPublicKeys[?Status == 'Active'].[SSHPublicKeyId]" --output text | while read -r key_id; do
  /usr/local/bin/aws iam get-ssh-public-key --user-name "$USER_NAME" --ssh-public-key-id "$key_id" --encoding SSH --query "SSHPublicKey.SSHPublicKeyBody" --output text
done

sshd will use this output to authorize a user attempting to log into the bastion. All credentials are managed in IAM, and no credentials are persisted to the bastion instance itself. If you remove a user or their keys from IAM, there is no way for them to log in (even if they left a bastion instance running).

You could also just as easily use the SSM Parameter Store or Vault depending on your requirements.

Supporting Scripts

Nothing too interesting as far as the supporting scripts in bastion/bin. They just handle some of the details of building, tagging, and pushing the container images to ECR. See the README for details.

Bastion Service

The bastion service provides an API for users to create and destroy bastion instances. Underneath, we’re using ECS which is fairly bare bones as far as orchestration goes but good enough for our purposes here.

When working with ECS in general and Fargate in particular, we need to define two different IAM roles for the instance. First, we need to define the execution role which is used to fetch the container image and spin things up. Second, we need to define the task role which the container will use once it’s running.

The task role defines what resources the container will have access to. The task role permissions will differ for each service you create, so, for modularity, we’ll pass in the JSON for the policy via a variable.

The execution role is fairly simple. We need to authorize ECR access and permission to download the bastion image. We also need to authorize CloudWatch logging.

resource "aws_iam_role" "execution_role" {
  name               = "bastion-execution"
  assume_role_policy = data.aws_iam_policy_document.assume_role.json
}

data "aws_iam_policy_document" "execution_role" {
  statement {
    actions   = ["ecr:GetAuthorizationToken"]
    resources = ["*"]
  }

  statement {
    actions = [
      "ecr:BatchCheckLayerAvailability",
      "ecr:BatchGetImage",
      "ecr:GetDownloadUrlForLayer",
    ]

    resources = [var.image_repository_arn]
  }

  statement {
    actions = [
      "logs:CreateLogStream",
      "logs:PutLogEvents",
    ]

    resources = ["*"]
  }
}

resource "aws_iam_role_policy" "execution_role" {
  name   = "bastion-execution"
  policy = data.aws_iam_policy_document.execution_role.json
  role   = aws_iam_role.execution_role.id
}

For the task role, you can add whatever permissions you need. But at a minimum, you need to grant the ability to assume the public key fetcher role we created above.

data "aws_iam_policy_document" "bastion_task_role" {
  statement {
    actions   = ["sts:AssumeRole"]
    resources = [module.bastion.public_key_fetcher_role_arn]
  }

  #
  # Add any other permissions needed here
  #
}

Because we are going to include service as a module within a parent service, we create the policy in the top-level module and inject it via a variable. That way, we can generalize the service module for use in multiple services.

module "bastion_service" {
  source = "./service"

  # …
  task_role_policy_json         = data.aws_iam_policy_document.bastion_task_role.json
  # …
}

Creating the task role within the service module is simple.

resource "aws_iam_role" "task_role" {
  name               = "bastion-task"
  assume_role_policy = data.aws_iam_policy_document.assume_role.json
}

resource "aws_iam_role_policy" "task_role" {
  policy = var.task_role_policy_json
  role   = aws_iam_role.task_role.id
}

Likewise, creating an ECS cluster is trivial.

resource "aws_ecs_cluster" "bastion" {
  name = "bastions"
}

Now we’re ready to create the ECS task.

ECS tasks are defined using a JSON file. Because we need to inject variables derived from our Terraform, we’re going to use a template_file data source.

[
  {
    "name": "${name}",
    "image": "${image}",
    "portMappings": [
      {
        "containerPort": 22
      }
    ],
    "environment": [
      {
        "name": "ASSUME_ROLE_FOR_AUTHORIZED_KEYS",
        "value": "${assume_role_for_authorized_keys}"
      }
    ],
    "logConfiguration": {
      "logDriver": "awslogs",
      "options": {
        "awslogs-group": "${log_group_name}",
        "awslogs-region": "${region}",
        "awslogs-stream-prefix": "ssh"
      }
    }
  }
]

data "template_file" "container_definitions" {
  template = file("${path.module}/container-definitions.tpl.json")

  vars = {
    assume_role_for_authorized_keys = var.public_key_fetcher_role_arn
    image                           = "${var.image_repository_url}:latest"
    log_group_name                  = aws_cloudwatch_log_group.bastion.name
    name                            = "bastion"
    region                          = var.region
  }
}

Next, we create the ECS task definition.

resource "aws_ecs_task_definition" "bastion" {
  container_definitions    = data.template_file.container_definitions.rendered
  cpu                      = "256"
  execution_role_arn       = aws_iam_role.execution_role.arn
  family                   = "bastions"
  memory                   = "512"
  network_mode             = "awsvpc"
  requires_compatibilities = ["FARGATE"]
  task_role_arn            = aws_iam_role.task_role.arn
}

You can adjust the CPU and memory however you like, but since we’re likely just proxying through this instance, I’ve set both to the lowest values supported.

With all of that out of the way, we can now create the API.

API

When I build an API with API Gateway, I typically begin by creating the individual functions.

I’m using Clojure and David Chelimsky’s excellent aws-api. If you don’t want to introduce Clojure into your environment, it shouldn’t be too hard to translate my code into the language of your choice. However, I encourage you to give Clojure a serious look. I’ve been using it for about a decade, and it will alter the way you think about your code.

For this API, we need 3 functions: one to create the bastion instance, one to trigger the bastion instance’s destruction, and one to actually destroy the instance. In this case, the creation and destruction functions share a fair bit of code, so I’ve chosen to keep them in a single Leiningen project. (For the Clojure developers in the audience, I will eventually get around to migrating this to tools.deps. But lein will do just fine for now.)

create-bastion

create-bastion is the most complex of the three functions.

First, we need to check whether an existing security group for the user exists. If it does not, we know that no bastion is currently running for this user, and we just create the security group locked to the user’s current IP and start the bastion task.

If the security group exists for that user, we need to check whether the ingress IP needs to be updated. If the IP matches, then it’s likely there is already a running bastion, so we attempt to find the task for the user. If it exists, there’s no reason to start another, so we just respond with the existing bastion IP. This prevents a user from starting up multiple, identical bastions and wasting resources. It also makes the process idempotent even if previous requests to start a bastion appear to have failed for one reason or another.

If the IP does not match, we need to check whether there’s already a running task. If there is, we need to stop it before removing the stale security group. Then, we delete the security group, recreate with the new IP, and start a new bastion task.

Note that we identify the user based off the AWS credentials used to sign the request.

(defn -handleRequest
  [_ input-stream output-stream _]
  (let [event (json/parse-stream (io/reader input-stream) true)
        cidr-ip (str (get-in event [:requestContext :identity :sourceIp]) "/32")
        user (last (cs/split (get-in event [:requestContext :identity :userArn]) #"/"))]
    (if-let [security-group-id (sg/get-id-for user)]
      (if (sg/ip-matches? security-group-id cidr-ip)
        (if-let [task (task/get-for user)]
          (stream-response (:bastion-ip task) output-stream)
          (start-bastion-and-stream-response user cidr-ip output-stream security-group-id))
        (if-let [task (task/get-for user)]
          (do
            (task/stop-for user task)
            (sg/delete-for user)
            (start-bastion-and-stream-response user cidr-ip output-stream))
          (do
            (sg/delete-for user)
            (start-bastion-and-stream-response user cidr-ip output-stream))))
      (start-bastion-and-stream-response user cidr-ip output-stream))))

One thing to note is that at the time of writing, there aren’t a lot of good ways to query running tasks. I chose to use startedBy which is fairly restrictive in terms of values (see the AWS documentation for more details). To work around this, I hash the user name with SHA-256 and trim the hash to the appropriate length. Depending on the number of users you need to support, YMMV.

(defn user-hash
  [user]
  (let [hash (digest/sha-256 user)]
    (subs hash 0 (- started-by-max-length 1))))

destroy-bastion

Destroying a bastion is much simpler (and still idempotent). If the user has a task running, we stop it. Then we delete the user’s security group.

(defn -handleRequest
  [_ input-stream _ _]
  (let [event (json/parse-stream (io/reader input-stream) true)
        user (:user event)]
    (if-let [task (task/get-for user)]
      (task/stop-for user task))
    (sg/delete-for user)))

trigger-bastion-destruction

For bastion destruction, there’s no reason for developers to wait around for the task to be stopped. We simply need to trigger destroy-bastion and return. I’m using ClojureScript (which compiles down to JavaScript) for this function since we don’t really need any of the heavier Clojure capabilities, and we can take advantage of the faster Node.js startup time.

(defn ^:export handle-request
  [event _ callback]
  (let [event (js->clj event :keywordize-keys true)
        user (last (cs/split (get-in event [:requestContext :identity :userArn]) #"/"))
        payload (.stringify js/JSON (clj->js {:user user}))]
    (.invoke lambda
             (clj->js {:FunctionName   destroy-bastion-function-name
                       :InvocationType "Event"
                       :Payload        payload})
             (fn [_ _]
               (callback nil (clj->js {:statusCode 200}))))))

Speed Bumps

ENI creation and deletion and public IP assignment for an instance can take a while. No matter your language of choice, you’ll want to account for this.

With Clojure, I check status every 2 minutes using core.async/timeout.

(defn get-public-ip
  [attachment-description]
  (println "Getting public IP for bastion")
  (loop [description (describe-network-interfaces attachment-description)]
    (let [network-interfaces (:NetworkInterfaces description)]
      (if-let [public-ip (get-in (first network-interfaces) [:Association :PublicIp])]
        public-ip
        (do
          (<!! (timeout 2000))                              ; ENI attachment & IP assignment may take some time
          (recur (describe-network-interfaces attachment-description)))))))

(defn wait-for-deletion
  [attachment-description]
  (println "Waiting for ENI deletion")
  (loop [description (describe-network-interfaces attachment-description)]
    (let [network-interfaces (:NetworkInterfaces description)]
      (if (> (count network-interfaces) 0)
        (do
          (<!! (timeout 2000))                              ; ENI destruction may take some time
          (recur (describe-network-interfaces attachment-description)))))))

Note that while I’ve found a 2 minute timeout is usually sufficient, it is still occasionally possible for create-bastion and destroy-bastion to time out. If create-bastion times out, we’ll get immediate feedback—simply re-run the create-bastion.sh script described below.

Because destroy-bastion is run offline in the background, I’ve added a dead letter queue. For this guide, we’re not implementing any failure handling beyond writing the event to SQS. Handling asynchronous failures with Lambda is a topic for another day.

resource "aws_sqs_queue" "dlq" {
  name = "destroy-bastion-dlq"
}

data "aws_iam_policy_document" "destroy_bastion" {
  # …

  statement {
    actions   = ["sqs:SendMessage"]
    resources = [aws_sqs_queue.dlq.arn]
  }

  # …
}

# …

resource "aws_lambda_function" "destroy_bastion" {
  # …

  dead_letter_config {
    target_arn = aws_sqs_queue.dlq.arn
  }
	
	# …
}

API Gateway

With the functions complete, we can create the API using API Gateway. We only need a single resource (bastion) and two methods (POST and DELETE).

Creating a bastion should return the IP address, and triggering bastion destruction should return nothing. We control access to the API (and identify the user associated with the bastion) using IAM. We also use that IAM identity to lock the bastion to a particular user.

I’m not going to go into the Terraform for the API. It’s fairly standard. Check out service/api/main.tf and service/api/bastion.tf in the repo.

CLI

One of my design goals was the ability for developers to work primarily from the command line, taking advantage of Terraform and the existing AWS tooling around credentials and identity. I also wanted to minimize the number of things a developer has to input to start and stop a bastion. If the data is available via Terraform or AWS, there’s no reason the developer should have to input it—the tools should just do the right thing.

session.rb

The core script is session.rb.

I chose to use Ruby for the heavy lifting (since that was my preferred language before Clojure). Python or JavaScript would work just as well.

Honestly, there’s not much to it. It just signs the requests to the bastion API with the user’s AWS credentials. session.rb isn’t invoked directly—instead, we use create.rb and destroy.rb via bash scripts.

The bash scripts that wrap the Ruby scripts take care of filling in any details that can be extracted using Terraform and the AWS CLI via environment variables.

require 'bundler'
Bundler.require

require 'json'
require 'net/http'
require 'uri'

INVOKE_URL = ENV['INVOKE_URL']
REGION = URI.parse(INVOKE_URL).host.split('.')[2]

BASE_HEADERS = {
    'Content-Type' => 'application/json',
    'Accept' => 'application/json'
}

def perform(action)
  uri = URI.parse("#{INVOKE_URL}/bastion")

  method = case action
           when :create
             'POST'
           when :destroy
             'DELETE'
           end

  signer = Aws::Sigv4::Signer.new(
      service: 'execute-api',
      region: REGION,
      credentials_provider: Aws::SharedCredentials.new
  )

  signature = signer.sign_request(
      http_method: method,
      url: uri.to_s,
      headers: BASE_HEADERS
  )

  http = Net::HTTP.new(uri.host, uri.port)
  http.use_ssl = true

  action_request = case method
                   when 'POST'
                     Net::HTTP::Post.new(uri)
                   when 'DELETE'
                     Net::HTTP::Delete.new(uri)
                   end

  BASE_HEADERS.merge(signature.headers).each { |k, v|
    action_request[k] = v
  }

  action_request['accept-encoding'] = nil
  action_request['user-agent'] = nil

  response = http.request(action_request)

  if response.code == '201'
    puts "#{JSON.pretty_generate(JSON.parse(response.body))}"
  end
end

create-bastion.sh

create-bastion.sh invokes the API via create.rb and writes the bastion’s IP address to .bastion-ip. Any software that can read files can consume the bastion’s IP, simplifying scripting around the REPL and proxies.

Starting up a bastion can take a while (especially from a cold start), so we also provide feedback to the user about where we are in the process. And because bastion creation is idempotent, we can run this script repeatedly without creating multiple instances.

#!/usr/bin/env bash

cd "$(cd "$(dirname "${BASH_SOURCE[0]}")" >/dev/null 2>&1 && pwd)/../.." # Start from a consistent working directory

echo "Fetching bastion service endpoint..."
invoke_url=$(terraform output bastion_service_endpoint)

if [[ -z ${invoke_url} ]]
then
  echo "No bastion service found." >&2
  exit 1
fi

cd service

echo "Creating bastion..."
INVOKE_URL=${invoke_url} bundle exec ruby create.rb | jq -r .ip > .bastion-ip

if [[ -z $(cat .bastion-ip) ]]
then
  rm .bastion-ip
  echo "No IP address returned. Probably just AWS being slow. Try re-running this script." >&2
  exit 1
else
  echo "Done"
  echo "ops@$(cat .bastion-ip)"
fi

Note that when you get the IP back, it may still take a minute or so before the container is fully up and running.

destroy-bastion.sh

destroy-bastion.sh is likewise straightforward. It checks whether there’s a .bastion-ip file and, if so, invokes destroy.rb.

#!/usr/bin/env bash

cd "$(cd "$(dirname "${BASH_SOURCE[0]}")" >/dev/null 2>&1 && pwd)/../.." # Start from a consistent working directory

if [[ -f "service/.bastion-ip" ]]
then
  echo "Fetching bastion service endpoint..."
  invoke_url=$(terraform output bastion_service_endpoint)

  if [[ -z ${invoke_url} ]]
  then
    echo "No bastion service found." >&2
    exit 1
  fi

  cd service

  echo "Destroying bastion..."
  INVOKE_URL=${invoke_url} bundle exec ruby destroy.rb

  rm .bastion-ip
fi

echo "Done"

In Closing

As mentioned above, you can find a fully functional example in my bastions-on-demand repo. If you run into any trouble, create an issue, and I’ll do my best to respond.