Tagged: docker

0

How to deploy a container image from Gitlab to AWS Fargate – the important bits

Jumping straight into it. On GitLab I assume you already have your dockerized application and a basic .gitlab-ci.yaml file.

We’re gonna want to build that image and push it to AWS ECR (Amazon Elastic Container Registry). In your gitlab ci file insert the following.

aws-deploy:
  image: docker:latest
  stage: build
  services:
    - docker:dind
  script:
    - apk update && apk -Uuv add python py-pip &&
        pip install awscli && apk --purge -v del py-pip &&
        rm /var/cache/apk/*
    - $(aws ecr get-login --no-include-email --region us-east-1)
    - docker build --pull -t "$AWS_REGISTRY_IMAGE:dev" .
    - docker push "$AWS_REGISTRY_IMAGE:dev"
  only:
    - master

You’re gonna need to set a build-time environment variable called AWS_REGISTRY_IMAGE with the URI of your ECR repository.

Great! We’re halfway there. Do a sample push and verify that your image ends up in ECR

Now we want to deploy to Fargate on every push so… assuming once again you already got your ECS cluster setup. We want to use a CodePipeline to detect pushes & deploy them to ECS.

Under AWS CodePipeline start a new pipeline, Source is Amazon ECR select the appropriate image and tags and stuff. Next

ok here’s the tricky part, to deploy to Fargate we need to add an imagedefinitions.json artifact to our image, this can be done automatically in the build step so.

Build Provider > AWS Codebuild

Create a new project

Environment Image > Managed Image
Operating System > Ubuntu
Runtime > Standard

Down to the Buildspec section, select ‘Insert Build Commands’

Then click ‘Switch to editor and enter the following

version: 0.2

phases:
  install:
    runtime-versions:
       python: 3.8
  post_build:
    commands:
      - printf '[{"name":"my-fargate-container-name","imageUri":"%s"}]' MYAWSID.dkr.ecr.us-east-1.amazonaws.com/MYIMAGENAME:dev > imagedefinitions.json
artifacts:
  files:
    - imagedefinitions.json

Finally Save, continue pipeline creation.. in the Deploy stage select the correct Fargate Cluster/service and DONE.

If you did all of that right, the next time you push to your master branch; it’ll automatically get built and deployed to Fargate!!

0

Rancher 2.0 etcd disaster recovery

This doc shows how to restore to a single node etcd cluster after a 3, 5 or 7 node cluster has lost quorum.

Ideally with these sorts of failures you want to try your best to get the original etcd hosts back up.

This is also done at your own risk, I have no association with Rancher nor am I a Rancher professional. It is also highly recommended to test this in a staging environment first. I will NOT be responsible for the loss of all your or your company’s data; which is exactly what will happen if this procedure fails.

With that out of the way; please read on.

This doc assumes you have
1. rancher_cli installed on your local machine
2. a working internet connection on the surviving etcd host

1. Login to the surviving host

rancher context switch
rancher ssh <surviving_etcd>

At this point you may want to do a docker inspect etcd to ensure the the following two directories are bind-mounted

...
        "Mounts": [
            {
                "Type": "bind",
                "Source": "/var/lib/etcd",
                "Destination": "/var/lib/rancher/etcd",
                "Mode": "z",
                "RW": true,
                "Propagation": "rprivate"
            },
            {
                "Type": "bind",
                "Source": "/etc/kubernetes",
                "Destination": "/etc/kubernetes",
                "Mode": "z",
                "RW": true,
                "Propagation": "rprivate"
            }
        ],

If you do not see the above.. Stop.

2. check the health of the cluster

docker exec -it etcd etcdctl member list
docker exec -it etcd etcdctl endpoint health

You should see unhealthy cluster

3. Take a snapshot of cluster

This ensures that if for any reason this operation fails, you have not lost all your data. We will store our snapshot in the /etc/kubernetes dir which is bind-mounted onto the same path on the host

mkdir -p /etc/kubernetes/etcd-snapshots/etcd-$(date +%Y%m%d)
docker exec -it etcd etcdctl snapshot save /etc/kubernetes/etcd-snapshots/etcd-$(date +%Y%m%d)/snapshot.db

4. Get deploy command

Lavie (https://github.com/lavie/runlike) has this great tool which approximates the deploy command used to put up a docker container. We will use it to get out etcd configuration. Run the following:

docker run --rm -v /var/run/docker.sock:/var/run/docker.sock assaflavie/runlike etcd

the output should be a pretty long docker run type string. Save it in a safe place for later

5. Destroy/Rename the old etcd container

docker stop etcd
docker rename etcd etcd_old

6. Start the new etcd container

  1. Edit the --initial-cluster area of the command from step 4, leaving only the surviving container.
  2. Append --force-new-cluster at the end of the command

Use this new string to deploy a new container.

7. Delete old nodes

In the rancher UI. You should now be able to access your cluster again. Delete the pools of the nodes that died. (This will take a while as rancher will redeploy etcd)

You are now free to continue using your cluster or create new nodes to expand your etcd cluster

END


Extra

In case everything went to hell, we can use the snapshot taken in step 3…

docker exec -it etcd etcdctl snapshot --data-dir=/var/lib/rancher/etcd/snapshot restore /etc/kubernetes/etcd-snapshots/etcd-$(date +%Y%m%d)/snapshot.db

docker stop etcd
mv /var/lib/etcd/member /var/lib/etcd/member_old
mv /var/lib/etcd/snapshot/member /var/lib/etcd/member
rmdir /var/lib/etcd/snapshot
docker start etcd

The above restores the snapshot to /var/lib/rancher/etcd/snapshot
We then stop etcd, archive the messed up etcd data (member_old) and replace it with the restored data