This is the first of our _client logs_, describing a problem encountered by a client, and a solution we helped design and deliver. Today we'll describe how we used Tailscale and ECS to help our client build an inexpensive, simple VPN solution. **Background** The client is a small company transitioning from startup mode into "scale-up mode". Their infrastructure is on AWS, and they needed to move certain resources, such as databases, into private VPC subnets so they were not internet-accessible. However, developers and administrators at the company need to retain access to the databases from their workstations for troubleshooting and maintenance tasks. We could easily have set up a bastion or jumphost, but we didn't want the additional management overhead of maintaining and protecting the SSH server, managing users and SSH keys, and so on. Plus, there were some non-technical users that needed access to internal resources. So we looked at VPN options. AWS Client VPN was an obvious option, but the pricing seemed higher than necessary. **Tailscale** Tailscale is a VPN service and client based on the Wireguard protocol. It was recommended both by many of my own peers, and by some developers at the client. We decided to give it a shot, and were pleased with the experience! In our experience, it's simple and easy-to-use. Workstations can join using a native UI client, and server machines can be added with a CLI client running as a "sidecar" on the same server or Kubernetes pod. You can also connect to entire private networks with a subnet router or relay. Finally, you can send all of your workstation's traffic through a Tailscale [exit node](https://nalth.is/tailscale-exit-nodes/). On the downside, there is no official Terraform provider (there is an unofficial provider), and OIDC authentication is only available under an "Enterprise"-level license. Additionally, as you'll see in this tutorial, the published Docker image either needs to be extended to run configuration commands such as `tailscale up`, or you need to run the commands outside of the `docker run` lifecycle. They offer a free version for personal use, and plans for businesses start at $5/user/month. **The solution: Running Tailscale on ECS** We decided to launch Tailscale on ECS. We will use the Fargate capacity provider since the client doesn't manage any EC2 instances and would like to keep it that way -- but this could easily be run on managed ECS nodes. We will run Tailscale in relay mode, but this could also be used to run an exit node or to run Tailscale as a sidecar to your other applications. We will run Tailscale using [userspace networking](https://tailscale.com/kb/1112/userspace-networking/) so that we don't have to provide low-level permissions to the container. **Step 1: Generate an auth key** [Auth keys](https://tailscale.com/kb/1085/auth-keys/) allow you to login to Tailscale without a UI. I'd recommend a non-reusable (one-off), ephemeral (instances get cleaned up) key. ![[tailscale-authkey.png|300]] **Step 2: Build Docker image** We can either launch Tailscale using a custom Docker image, or we can use Terraform's null\_resource and provisioning feature. If you're OK with the funky smell of a `time_sleep` and `null_resource` provisioner, skip this step and see [the examples in the associated GitHub repo](https://github.com/platformersdev/tailscale-fargate-terraform). The standard Tailscale docker image can be given the `tailscaled` command to run in daemon mode, but the first time it launches, it needs to be configured by running `tailscale up` and passing parameters such as the auth key, subnets, etc. Ideally we could pass these as environment variables, but that is not supported by the off-the-shelf image. Many of their instructions ([including for AWS App Runner](https://tailscale.com/kb/1127/aws-app-runner/), which is based on Fargate) suggest building your own bootstrap script. I built a Docker image similar to theirs, which contains a bootstrap script: ```bash #!/bin/sh function up() { until tailscale up --authkey=${TAILSCALE_AUTHKEY} ${TAILSCALE_UP_ARGS} do sleep 0.1 done } # send this function into the background up & exec tailscaled --tun=userspace-networking --state=$TAILSCALE_STATE_PARAMETER_ARN ``` And a Dockerfile: ```dockerfile # never use :latest. It will only cause you pain. FROM tailscale/tailscale:v1.21.26 ADD ./bootstrap.sh /bootstrap.sh CMD ['/bootstrap.sh'] ``` I decided against publishing this Docker image I we can't commit to the maintenance requirement of keeping the Docker image up to date with security patches and Tailscale's updates. If Tailscale could offer similar functionality with their official Docker image, we'd much appreciate it. **Step 3: Create AWS resources with Terraform** This Terraform configuration will take a few variables: ```hcl variable "image_name" { type = string } variable "container_command" { type = list(string) default = null } variable "subnets" { type = list(string) } variable "authkey" { type = string } variable "tailscale_up_args" { type = string default = "--advertise-routes 10.1.0.0/16,10.2.0.0/16" } ``` Create a task definition and service: ``` data "aws_region" "current" {} data "aws_caller_identity" "current" {} locals { name = "tailscale-vpn" } resource "aws_ecs_cluster" "default" { name = local.name } resource "aws_ecs_service" "default" { name = local.name cluster = aws_ecs_cluster.default.name task_definition = aws_ecs_task_definition.default.arn launch_type = "FARGATE" desired_count = 1 enable_execute_command = true network_configuration { assign_public_ip = false subnets = var.subnets security_groups = [aws_security_group.default.id] } wait_for_steady_state = true } data "aws_subnet" "subnet" { id = var.subnets[0] } resource "aws_security_group" "default" { name = local.name vpc_id = data.aws_subnet.subnet.vpc_id egress { from_port = 0 to_port = 0 protocol = "-1" cidr_blocks = ["0.0.0.0/0"] ipv6_cidr_blocks = ["::/0"] } } resource "aws_ecs_task_definition" "default" { family = local.name container_definitions = jsonencode([ { name = "tailscale" image = var.image_name essential = true linuxParameters = { initProcessEnabled = true } logConfiguration = { logDriver = "awslogs" options = { "awslogs-group" = aws_cloudwatch_log_group.default.name, "awslogs-region" = data.aws_region.current.name, "awslogs-stream-prefix" = local.name } } environment = var.container_environment command = var.container_command } ]) requires_compatibilities = ["FARGATE"] network_mode = "awsvpc" cpu = 256 memory = 512 execution_role_arn = aws_iam_role.execution.arn task_role_arn = aws_iam_role.task.arn } resource "aws_cloudwatch_log_group" "default" { name = "ecs/${local.name}" retention_in_days = 30 } ``` I've configured Tailscale to store its state in an SSM parameter. Thus, there is necessary IAM configuration: ``` data "aws_iam_policy_document" "ecs_tasks_service" { statement { effect = "Allow" actions = ["sts:AssumeRole"] principals { type = "Service" identifiers = [ "ecs-tasks.amazonaws.com", ] } } } resource "aws_iam_role" "execution" { name = "${local.name}-execution" assume_role_policy = data.aws_iam_policy_document.ecs_tasks_service.json } resource "aws_iam_role_policy_attachment" "execution" { role = aws_iam_role.execution.id policy_arn = "arn:aws:iam::aws:policy/service-role/AmazonECSTaskExecutionRolePolicy" } resource "aws_iam_role" "task" { name = "${local.name}-task" assume_role_policy = data.aws_iam_policy_document.ecs_tasks_service.json } resource "aws_iam_role_policy" "task" { policy = <<POLICY { "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Action": [ "ssm:GetParameter", "ssm:PutParameter" ], "Resource": [ "arn:aws:ssm:${data.aws_region.current.name}:${data.aws_caller_identity.current.account_id}:parameter/tailscale-vpn-state" ] } ] } POLICY role = aws_iam_role.task.id } resource "aws_iam_role_policy_attachment" "ssm" { policy_arn = "arn:aws:iam::aws:policy/AmazonSSMManagedInstanceCore" role = aws_iam_role.task.name } ``` Add this module to your Terraform project, supply variables such as the following, run `terraform plan`, and apply. ``` module "tailscale" { source = "../.." subnets = data.aws_subnets.subnets.ids image_name = local.image container_environment = [ { name = "TAILSCALE_UP_ARGS", value = local.up_args }, { name = "TAILSCALE_STATE_PARAMETER_ARN", value = local.state_parameter_arn }, { name = "TAILSCALE_AUTHKEY", value = local.authkey }, ] } ``` **Step 4: Approve in Tailscale** We need to approve the subnets or exit nodes, however you configured in the arguments, in the [Tailscale Admin UI](https://login.tailscale.com/admin/machines). This is a one-time thing: ![[tailscale-approve.png|400]] ![[tailscale-approve2.png|300]] **Step 5: Enjoy!** If you configured a subnet router, try to navigate to a private IP in that subnet. If you configured an exit node, select the Exit Node from the Tailscale menu, and [check your IP address](https://www.whatismyip.com/) -- it should be an AWS IP! ![[tailscale-menu.png|400]] ![[tailscale-whatismyip.png|300]] If you tear this down, you may want to delete the machine state from the configured SSM parameter: ``` $ aws ssm delete-parameters --names "tailscale-vpn-state" ``` **Conclusion** The source code referenced here is all available [the GitHub repository](https://github.com/platformersdev/tailscale-fargate-terraform). I hope this was a helpful tutorial. If anything is unclear, please reach out and I will do my best to answer any questions you may have! ___ ## Sources [Run a Tailscale VPN relay on ECS/Fargate - Platformers](https://platformers.dev/log/2022/tailscale-ecs/)