Category: DevOps

0

How to get Kubernetes CPU allocation for all pods

With the a average person using Kubernetes to deploy workloads it’s likely you’ll hit a CPU allocation wall pretty early. To help debug and optimize your limits, the following lists the cpu requests for each pod.

Credit to abelal83 for the solution.

kubectl get po --all-namespaces -o=jsonpath="{range .items[*]}{.metadata.namespace}:{.metadata.name}{'\n'}{range .spec.containers[*]}  {.name}:{.resources.requests.cpu}{'\n'}{end}{'\n'}{end}"

In my usage however I’ve found the need to also see the not only the cpu requests but memory requests as well as cpu/mem limits. To do so I’ve tweaked the above to be the following

kubectl get po --all-namespaces -o=jsonpath="{range .items[*]}{.metadata.namespace}:{.metadata.name}{'\n'}{range .spec.containers[*]}  {.name}:{'limits.cpu'}:{.resources.limits.cpu}{'\n'}  {.name}:{'limits.memory'}:{.resources.limits.memory}{'\n'}  {.name}:{'requests.cpu'}:{.resources.requests.cpu}{'\n'}  {.name}:{'requests.memory'}:{.resources.requests.memory}{'\n'}{end}{'\n'}{end}"

the output of of the above looks like

(more…)

0

Monitor your Kubernetes pod cpu and memory usage

watch -n 10 "kubectl top pod --all-namespaces | sort -r -k 3 -n"

Breakdown

watch -n 10 – run every 10s
kubectl top pod --all-namespaces – Get the mem and cpu usage of all pods across all namespaces
sort -r -k 3 -n" – Sort in Reverse order on Key 3 and treat as a Number

0

How to deploy a container image from Gitlab to AWS Fargate – the important bits

Jumping straight into it. On GitLab I assume you already have your dockerized application and a basic .gitlab-ci.yaml file.

We’re gonna want to build that image and push it to AWS ECR (Amazon Elastic Container Registry). In your gitlab ci file insert the following.

aws-deploy:
  image: docker:latest
  stage: build
  services:
    - docker:dind
  script:
    - apk update && apk -Uuv add python py-pip &&
        pip install awscli && apk --purge -v del py-pip &&
        rm /var/cache/apk/*
    - $(aws ecr get-login --no-include-email --region us-east-1)
    - docker build --pull -t "$AWS_REGISTRY_IMAGE:dev" .
    - docker push "$AWS_REGISTRY_IMAGE:dev"
  only:
    - master

You’re gonna need to set a build-time environment variable called AWS_REGISTRY_IMAGE with the URI of your ECR repository.

Great! We’re halfway there. Do a sample push and verify that your image ends up in ECR

Now we want to deploy to Fargate on every push so… assuming once again you already got your ECS cluster setup. We want to use a CodePipeline to detect pushes & deploy them to ECS.

Under AWS CodePipeline start a new pipeline, Source is Amazon ECR select the appropriate image and tags and stuff. Next

ok here’s the tricky part, to deploy to Fargate we need to add an imagedefinitions.json artifact to our image, this can be done automatically in the build step so.

Build Provider > AWS Codebuild

Create a new project

Environment Image > Managed Image
Operating System > Ubuntu
Runtime > Standard

Down to the Buildspec section, select ‘Insert Build Commands’

Then click ‘Switch to editor and enter the following

version: 0.2

phases:
  install:
    runtime-versions:
       python: 3.8
  post_build:
    commands:
      - printf '[{"name":"my-fargate-container-name","imageUri":"%s"}]' MYAWSID.dkr.ecr.us-east-1.amazonaws.com/MYIMAGENAME:dev > imagedefinitions.json
artifacts:
  files:
    - imagedefinitions.json

Finally Save, continue pipeline creation.. in the Deploy stage select the correct Fargate Cluster/service and DONE.

If you did all of that right, the next time you push to your master branch; it’ll automatically get built and deployed to Fargate!!

0

Rancher 2.0 etcd disaster recovery

This doc shows how to restore to a single node etcd cluster after a 3, 5 or 7 node cluster has lost quorum.

Ideally with these sorts of failures you want to try your best to get the original etcd hosts back up.

This is also done at your own risk, I have no association with Rancher nor am I a Rancher professional. It is also highly recommended to test this in a staging environment first. I will NOT be responsible for the loss of all your or your company’s data; which is exactly what will happen if this procedure fails.

With that out of the way; please read on.

This doc assumes you have
1. rancher_cli installed on your local machine
2. a working internet connection on the surviving etcd host

1. Login to the surviving host

rancher context switch
rancher ssh <surviving_etcd>

At this point you may want to do a docker inspect etcd to ensure the the following two directories are bind-mounted

...
        "Mounts": [
            {
                "Type": "bind",
                "Source": "/var/lib/etcd",
                "Destination": "/var/lib/rancher/etcd",
                "Mode": "z",
                "RW": true,
                "Propagation": "rprivate"
            },
            {
                "Type": "bind",
                "Source": "/etc/kubernetes",
                "Destination": "/etc/kubernetes",
                "Mode": "z",
                "RW": true,
                "Propagation": "rprivate"
            }
        ],

If you do not see the above.. Stop.

2. check the health of the cluster

docker exec -it etcd etcdctl member list
docker exec -it etcd etcdctl endpoint health

You should see unhealthy cluster

3. Take a snapshot of cluster

This ensures that if for any reason this operation fails, you have not lost all your data. We will store our snapshot in the /etc/kubernetes dir which is bind-mounted onto the same path on the host

mkdir -p /etc/kubernetes/etcd-snapshots/etcd-$(date +%Y%m%d)
docker exec -it etcd etcdctl snapshot save /etc/kubernetes/etcd-snapshots/etcd-$(date +%Y%m%d)/snapshot.db

4. Get deploy command

Lavie (https://github.com/lavie/runlike) has this great tool which approximates the deploy command used to put up a docker container. We will use it to get out etcd configuration. Run the following:

docker run --rm -v /var/run/docker.sock:/var/run/docker.sock assaflavie/runlike etcd

the output should be a pretty long docker run type string. Save it in a safe place for later

5. Destroy/Rename the old etcd container

docker stop etcd
docker rename etcd etcd_old

6. Start the new etcd container

  1. Edit the --initial-cluster area of the command from step 4, leaving only the surviving container.
  2. Append --force-new-cluster at the end of the command

Use this new string to deploy a new container.

7. Delete old nodes

In the rancher UI. You should now be able to access your cluster again. Delete the pools of the nodes that died. (This will take a while as rancher will redeploy etcd)

You are now free to continue using your cluster or create new nodes to expand your etcd cluster

END


Extra

In case everything went to hell, we can use the snapshot taken in step 3…

docker exec -it etcd etcdctl snapshot --data-dir=/var/lib/rancher/etcd/snapshot restore /etc/kubernetes/etcd-snapshots/etcd-$(date +%Y%m%d)/snapshot.db

docker stop etcd
mv /var/lib/etcd/member /var/lib/etcd/member_old
mv /var/lib/etcd/snapshot/member /var/lib/etcd/member
rmdir /var/lib/etcd/snapshot
docker start etcd

The above restores the snapshot to /var/lib/rancher/etcd/snapshot
We then stop etcd, archive the messed up etcd data (member_old) and replace it with the restored data

How to manually generate SSL certificates for Flynn applications 1

How to manually generate SSL certificates for Flynn applications

For the last few years the flynn team has been working on getting us letsencrypt integration. While I feel the functionally should be here soon, in the meantime we just have to make the requests ourselves

Step 1. Using letsencrypt, perform a manual request

I’m currently using Ubuntu 18.04 so to install is just a matter of

sudo apt install certbot

I’m sure you can figure out how to get it installed if you’re running any other distro.

Now to make the manual request we do

sudo certbot certonly --manual --preferred-challenges dns

This will perform a dns challenge where we set the content of a TXT record in our zone file. In my opinion it is the easiest but you also have the options of http and tls-sni. (See more here

Step 2. Add to Flynn

A. If the route does not already exist in Flynn

sudo flynn -a **my-app-name** route add http \
  -c /etc/letsencrypt/live/**my.domain.com**/fullchain.pem \
  -k /etc/letsencrypt/live/**my.domain.com**/privkey.pem **my.domain.com**

This will add a new route () and apply our certificate and key. We are done.

B. If the route already exists in Flynn

We get the appropriate route id with

flynn -a **my-app-name** route

And we update with

sudo flynn -a **my-app-name** route update \
  **http/my-very-long-route-id-593375844** -s  http  \
  -c /etc/letsencrypt/live/**my.domain.com**/fullchain.pem \
  -k /etc/letsencrypt/live/**my.domain.com**/privkey.pem

Don’t forget to change
1. Your app name (can find with flynn apps)
1. The route ID
2. the path for the cert
4. The path for the and key.

Done, you should now have https on your Flynn site.

Let me know if you have any questions

How to add Flynn to your GitLab CI/CD workflow 0

How to add Flynn to your GitLab CI/CD workflow

Goal: Make GitLab deploy to Flynn

I will assume you already have your .yml setup to build your project. As I will only cover the deploy section.

You also need to have an app created on your Flynn server and any resources already created. If you need help doing this there is great documentation on the official website https://flynn.io/docs/basics to get you initially setup

Requirements

Either your Flynn cluster add string that you got on first install

flynn cluster add -p <tls pin> <cluster name> <controller domain> <controller key>

or the backup located in ~/.flynnrc

[[cluster]]
  Name = "default"
  Key = "347skdfh2389hskdfds"
  ControllerURL = "https://controller.my.flynn.site.com"
  TLSPin = "SLDFKSDF3E0Y3924Y23HJKLHDSFOE="
  DockerPushURL = ""

Step 1. Configure Environment Variables

It is highly recommended that you create environment variables in GitLab for the above variables (Settings > CI/CD > Variables). While you could hard-code them… please don’t.

In this tutorial I will be using the following env mapping.

FLYNN_TLS_PIN=SLDFKSDF3E0Y3924Y23HJKLHDSFOE=
FLYNN_CLUSTER_NAME=default
FLYNN_CONTROLLER_DOMAIN=my.flynn.site.com
FLYNN_CONTROLLER_KEY=347skdfh2389hskdfds

Note that the FLYNN_CONTROLLER_DOMAIN has thehttps://controller.part removed compared to the~/.flynnrc` file.

Step 2. Update gitlab-ci

In your gitlab-ci.yml file, create a deploy with the following

...
staging:
  type: deploy
  script:
  - L=/usr/local/bin/flynn && curl -sSL -A "`uname -sp`" https://dl.flynn.io/cli | zcat >$L && chmod +x $L
  - flynn cluster add -p $FLYNN_TLS_PIN $FLYNN_CLUSTER_NAME $FLYNN_CONTROLLER_DOMAIN $FLYNN_CONTROLLER_KEY
  - flynn -a `app-name-staging` remote add
  - git push flynn master
  only:
  - master
...

Replace app-name-staging with the name of your app. You can find it with flynn apps.

Step 3. Commit and Push

At this point we are essentially done. All commits henceforth will be pushed to Flynn.

Issues? Let me know in the comments below.