Splitting Audio with AWS ECS

How to Leverage Docker in the Cloud

Published in

The Startup

5 min readApr 25, 2020

This post explores another sufficiently complex example, this time leveraging Docker in the cloud. Similar to the last post, let’s start with a seemingly trivial ffmpeg command and see what it takes to build a similar solution in the cloud.

# Splits audio from input.mp4 into a file called output.mp3
ffmpeg \
  -i input.mp4 \
  -f mp3 -ab 192000 \
  -vn output.mp3

The above command uses ffmpeg to read in a video file called input.mp4 and split out the audio into a file called output.mp3. It seems easy enough. Let’s see what it takes to incrementally build up into a solution in the cloud!

Leverage Docker Locally

Before diving deep into AWS let’s start by containerizing the above example. Our first step is to get a hold of a Docker image that has ffmpeg already installed on it. We could create one from scratch. Fortunately, an image with ffmpeg installed already exists in Docker Hub.

# Splits audio with ffmpeg in an ephemeral Docker container
docker run \
  -v $PWD:/temp jrottenberg/ffmpeg:3.2-alpine \
  -i /temp/split.mp4 \
  -f mp3 -ab 192000 \
  -vn /temp/output.mp3

Here we use docker run to spin up an ephemeral Docker container. Because the jrottenberg/ffmpeg image specifies the ffmpeg binary as an entrypoint we can nearly consider docker run jrottenberg/ffmpeg:3.2-alpine as a drop-in replacement for invoking ffmpegdirectly. In order for the container to have access to the video file the current host directory is mounted as a volume to /temp in the Docker container which is where the input and output files are specified.

Use S3 for Intermediate Storage

The next step for turning this into a cloud-based solution is to pull in an external source of persistence. At a high level we want a local Docker container to run a script which will download a video file from S3 and use ffmpeg to split the audio.

Let’s start with a simple bash script that downloads a video file from S3.

This script will be executed within the Docker container. However, our current Docker image doesn’t have the AWS CLI installed or a way to invoke this script. In order to make it work correctly we’ll need to enhance the existing ffmpeg image with our own Dockerfile.

The above Dockerfile does a few things. Our image uses thejrottenberg/ffmpeg:3.2-alpine image as the base image. We install bash (for troubleshooting), python (for AWS CLI), and the AWS CLI. The S3 startup script is copied to /home where a couple of directories are created. Lastly, the entrypoint in the base image is overridden with the startup script. This will allow us to pass an argument into the Docker container which will be interpreted as the first argument to startup.sh.

Leverage ECS

Now it’s time to bring ECS into the picture. We can push our custom Docker image to ECR and that will make it available to be spun up as a container in ECS. Conceptually, the boxes and arrows remain relatively simple.

The additional upload to S3 requires a very slight modification to startup.sh. After the audio file is split let’s again use the AWS CLI to perform the upload.

For the next incremental step we manually upload a video to S3 and manually execute run-task in ECS from a laptop. ECS will spin up a Docker container with the parameters provided with run-task, it will then download the file from S3, it’ll split out the audio, and the audio file will be uploaded back to S3. At that point we can manually download the audio file from S3.

aws ecs run-task \
  --task-definition split-task:1 \
  --cluster arn:aws:ecs:us-east-1:<account_id>:cluster/mycluster \
  --overrides '{"containerOverrides":[{"name": "split-container", "command": ["<bucket>/split.mp4"]}]}'

Note the command in the containerOverrides above. This value is sent as an argument to the entrypoint which is read in as the first argument to startup.sh.

Invoke ECS from a triggered Lambda Function

Our final step is to remove the manual invocation of the ECS Task. The manual invocation can be replaced by an AWS Lambda Function triggered by the upload of an object to a specific S3 bucket. The Lambda Function subsequently invokes runTask with the AWS SDK.

With the above Lambda Function the process looks like this:

From my laptop I upload a video to S3
A successful upload triggers the above Lambda Function and it is invoked with an event that specifies the key of the uploaded object
The Function uses the AWS SDK to invoke an ECS Task in a cluster called mycluster. Included in this invocation is a set of container overrides along with a Docker command which specifies the S3 object key. This is ultimately the argument for startup.sh.
ECS starts up a Docker container using the ECR image we defined in the above Dockerfile. It receives the S3 object key as an argument, downloads the video file, uses ffmpeg to split the audio, and uploads the audio file back to S3.
From my laptop I can now download the separated audio file