Heili CI/CD - part 2

In the first part one we mentioned that we can’t have tag triggered builds. This drawback forced us to redesigned the CD flow of our application. This issue popped out about the same time as we grew and started to have new environments for different developmental processes.

The new flow

Our new flow requirement was to allow running different app versions on each environment and allow us to use same image later for next environment(build-once). Unfortunately making build-once CI/CD architecture currently would involve too much custom scripts we decide to postpone it, as we mostly don’t use compiled languages anyway.

New flow was that each commit has new docker image tagged with commit Id and development environment tag(yep, it’s “dev”). This can really useful in “build-once” strategy as we can just assign environments tags to docker images with git commit Ids and know exactly what is running where. Because right now we can’t do it easily we use Github pull requests to move new code between environments. Each approved Pull request get new commit Id and branch name is environment name.

In the next Cloud Build file we can see how we deploy the application to different environments (and couple of more cool features Cloud Build has):

steps:
# Get latest heili private-repo
- name: gcr.io/cloud-builders/curl
  secretEnv: ['GITHUB_API_TOKEN']
  entrypoint: bash
  args:
  - -c
  - curl -L -u "heili:$$GITHUB_API_TOKEN" https://github.com/heilihq/private-repo/archive/master.zip --output /shared_dir/private-repo.zip
  volumes:
  - name: 'shared_dir'
    path: '/shared_dir'
# Pull latest image for caching
- name: 'gcr.io/cloud-builders/docker'
  args:
  - 'pull'
  - 'gcr.io/${PROJECT_ID}/${_DOCKER_IMAGE_NAME}:${_PROD_BRANCH}'
# Build and push the docker
- name: 'gcr.io/cloud-builders/docker'
  args:
  - 'build'
  - '-t'
  - 'gcr.io/${PROJECT_ID}/${_DOCKER_IMAGE_NAME}:${BRANCH_NAME}'
  - '-t'
  - 'gcr.io/${PROJECT_ID}/${_DOCKER_IMAGE_NAME}:${SHORT_SHA}'
  - --build-arg
  - HEILI_VERSION=${BRANCH_NAME}
  - '--cache-from'
  - 'gcr.io/${PROJECT_ID}/${_DOCKER_IMAGE_NAME}:${_PROD_BRANCH}'
  - '.'
  volumes:
  - name: 'shared_dir'
    path: '/workspace/shared_dir'
- name: 'gcr.io/cloud-builders/docker'
  args:
  - 'push'
  - 'gcr.io/${PROJECT_ID}/${_DOCKER_IMAGE_NAME}:${SHORT_SHA}'
- name: 'gcr.io/cloud-builders/docker'
  args:
  - 'push'
  - 'gcr.io/${PROJECT_ID}/${_DOCKER_IMAGE_NAME}:${BRANCH_NAME}'
# Deploy ot Heili cluster
- name: 'gcr.io/cloud-builders/kubectl'
  id: deploy_dev
  entrypoint: 'bash'
  args:
  - '-c'
  - |
    [[ "${BRANCH_NAME}" != "master" ]] && /builder/kubectl.bash --namespace ${_DEV_NAMESPACE} set image deployments -l ${_DEPLOYMENT_SELECTOR} ${_DEPLOYMENT_CONTAINER_NAME}=gcr.io/${PROJECT_ID}/${_DOCKER_IMAGE_NAME}:${SHORT_SHA} --record || echo "skipping dev deploy . . ."
  env:
  - 'CLOUDSDK_COMPUTE_ZONE=us-central1'
  - 'CLOUDSDK_CONTAINER_CLUSTER=heili-us-central1'
- name: 'gcr.io/cloud-builders/kubectl'
  id: deploy_prod
  entrypoint: 'bash'
  args:
  - '-c'
  - |
    [[ "${BRANCH_NAME}" == "${_PROD_BRANCH}" ]] && /builder/kubectl.bash --namespace ${_PROD_NAMESPACE} set image deployments -l ${_DEPLOYMENT_SELECTOR} ${_DEPLOYMENT_CONTAINER_NAME}=gcr.io/${PROJECT_ID}/${_DOCKER_IMAGE_NAME}:${BRANCH_NAME} --record || echo "skipping prod deploy . . ."
  env:
  - 'CLOUDSDK_COMPUTE_ZONE=us-central1'
  - 'CLOUDSDK_CONTAINER_CLUSTER=heili-us-central1'
substitutions:
    _PROD_NAMESPACE: default
    _DEV_NAMESPACE: dev
    _PROD_BRANCH: master
    _DOCKER_IMAGE_NAME: heili
    _DEPLOYMENT_CONTAINER_NAME: heili
    _DEPLOYMENT_SELECTOR: app=heili
secrets:
- kmsKeyName: projects/sonic-proxy-776/locations/global/keyRings/heili-uber-keyring/cryptoKeys/github-api-key
  secretEnv:
    GITHUB_API_TOKEN: CiQAG5swdGyqXIx/jDkpqRUercIQ3+GxshF+ydJLCzRLeZc5znsSUQAvApTtTJJaDKVjmTS4Ytz0224y+WxMinFKxTlfh5Cdpv29Rq2G7CDSGVeg9RX9DCkmpXAY7iCfMEgWusurp9KipamQrb+G8Wi2Be1FvGJqqA==
timeout: '15m'
options:
  substitution_option: 'ALLOW_LOOSE'
  machineType: 'N1_HIGHCPU_8'
images:
- 'gcr.io/${PROJECT_ID}/${_DOCKER_IMAGE_NAME}:${BRANCH_NAME}'
- 'gcr.io/${PROJECT_ID}/${_DOCKER_IMAGE_NAME}:${SHORT_SHA}'

Let’s go over the file from the end and see what’s we have there:

  • images - Publish docker images as artifacts at the end.

  • options - Different custom options for the Cloud Builder:

    • substitution_option - Allow the not fill all the substitutions.

    • machineType - What machine size to use for this build. This build is heavy, so we want stronger machine.

  • timeout - Our build is longer than the default timeout (10 minutes), but we don’t want it to run for too long, there might be a problem with it.

  • secrets - Allows us to use encrypted data inside the build, like GitHub API key in this example. The key is encrypted with Google KMS and decrypted in build process. Here you can read more about it.

  • substitutions - We can call it arguments also. It’s variables with default values that can be overwritten by other Cloud Build (we will see it later). This is useful if you need to use same variable more than once.

  • steps - This is the build itself, list of steps that do over build:

    • - name: gcr.io/cloud-builders/curl
        secretEnv: ['GITHUB_API_TOKEN']
        entrypoint: bash
        args:
        - -c
        - curl -L -u "heili:$$GITHUB_API_TOKEN" https://github.com/heilihq/private-repo/archive/master.zip --output /shared_dir/private-repo.zip
        volumes:
        - name: 'shared_dir'
          path: '/shared_dir'

      First we download private repository into our build environment. We don’t have git command, so we download latest archive of master branch. Since it’s private repo, we have to provide credentials. The archive is stored in “shared_dir” - directory that can be passed between build steps.

    • - name: 'gcr.io/cloud-builders/docker'
        args:
        - 'pull'
        - 'gcr.io/${PROJECT_ID}/${_DOCKER_IMAGE_NAME}:${_PROD_BRANCH}'

      Pull latest docker image and next step build new image using the cache. Since we don’t have tags anymore, the structure is to have branches (always had) where master is latest production image and feature branches. Those branch images used in kubernetes manifests, since we don’t know what is latest commit ID image tag we have.

    • - name: 'gcr.io/cloud-builders/docker'
        args:
        - 'build'
        - '-t'
        - 'gcr.io/${PROJECT_ID}/${_DOCKER_IMAGE_NAME}:${BRANCH_NAME}'
        - '-t'
        - 'gcr.io/${PROJECT_ID}/${_DOCKER_IMAGE_NAME}:${SHORT_SHA}'
        - --build-arg
        - HEILI_VERSION=${BRANCH_NAME}
        - '--cache-from'
        - 'gcr.io/${PROJECT_ID}/${_DOCKER_IMAGE_NAME}:${_PROD_BRANCH}'
        - '.'
        volumes:
        - name: 'shared_dir'
          path: '/workspace/shared_dir'

      Simple docker build step where we create 2 new docker images - with commit Id and branch name. Branch name is more human readable and works when we deploy to production branch. Commit Id allow us to understand exactly what we run, when, where and how. In the dockerfile we use private repo from first step. It is stored in “shared_dir”, but this time we mount it inside “/workspace” directory. We do that because each docker build in Cloud Build is running inside workspace directory. Here how we use it inside dockerfile:

      COPY --chown=heili:heili shared_dir/private-repo.zip /tmp/
      RUN unzip /tmp/private-repo.zip -d /tmp && \
          mkdir -p /opt/heili/app/private-repo && \
          cp -a /tmp/private-repo-master/dist/ /opt/heili/app/private-repo/ && \
          rm -r /tmp/private-repo-master /tmp/private-repo.zip
    • - name: 'gcr.io/cloud-builders/kubectl'
        id: deploy_dev
        entrypoint: 'bash'
        args:
        - '-c'
        - |
          [[ "${BRANCH_NAME}" != "master" ]] && /builder/kubectl.bash --namespace ${_DEV_NAMESPACE} set image deployments -l ${_DEPLOYMENT_SELECTOR} ${_DEPLOYMENT_CONTAINER_NAME}=gcr.io/${PROJECT_ID}/${_DOCKER_IMAGE_NAME}:${SHORT_SHA} --record || echo "skipping dev deploy . . ."
        env:
        - 'CLOUDSDK_COMPUTE_ZONE=us-central1'
        - 'CLOUDSDK_CONTAINER_CLUSTER=heili-us-central1'
      - name: 'gcr.io/cloud-builders/kubectl'
        id: deploy_prod
        entrypoint: 'bash'
        args:
        - '-c'
        - |
          [[ "${BRANCH_NAME}" == "${_PROD_BRANCH}" ]] && /builder/kubectl.bash --namespace ${_PROD_NAMESPACE} set image deployments -l ${_DEPLOYMENT_SELECTOR} ${_DEPLOYMENT_CONTAINER_NAME}=gcr.io/${PROJECT_ID}/${_DOCKER_IMAGE_NAME}:${BRANCH_NAME} --record || echo "skipping prod deploy . . ."
        env:
        - 'CLOUDSDK_COMPUTE_ZONE=us-central1'
        - 'CLOUDSDK_CONTAINER_CLUSTER=heili-us-central1'

Cloud Build helps us a lot in fast delivery of new features and bug fixes into our production environment. This can be achieved with any CI/CD tool if it’s used correctly, usually the hardest part is picking up the correct tool for you as all has advantages and disadvantages. Current Cloud Build drawbacks we improve with some scripting and some we wait for being fixed as they are not crucial for us at this point. The most important feature that we miss in Cloud Build is filtering by branch or something similar) and we are not alone, here’re some issues about the request are here, here and here.

David Golovan