One major win is that we always maintained our own deployment config format in a centralized repo. That has made it really light weight to build tooling to deploy to either #Mesos or to #Kubernetes during the migration. 

We also have service discovery that spans both clusters, with instances of the same app available from either cluster. We can roll back and forth between Mesos or Kubernetes without any changes to configs. That really lowers the risk of the migration.

There are about 125 services to move. Most of them will be a simple redeploy. 

We’ve been moving all of our infrastructure from the rock solid #Mesos (with Singularity) platform others and I built up over years, to #Kubernetes. It’s going well and all our stuff works.

But, you sort of forget all of the operational improvements and optimizations you have made until you move to a new platform.  Lots of little things to resolve, rough edges or bad ergonomics that we will need to address.  Shout out to HubSpot’s Singularity scheduler (RIP) on Mesos for being so great to operate.  Set a high bar. 

We wanted to have a service that would optionally tail logs from #Kubernetes for apps we deploy and report them over UDP syslog—in an existing JSON log format that we use from #Mesos.

  • It should make the log scrape/relay decision based on Annotations on the Pods.
  • It should rate limit by *pod* and not by host/node so that we don't overrun our log provider (e.g. when someone forgets to turn of debug logging) or starve other apps on the same node from being able to send their logs.
  • It should report rate limiting to our metrics system so we can track which pods are getting limited.

There was nothing that we could find that was able to do all of that. So I spent the last two days writing it in #Golang and we're doing initial deployment of that as a DaemonSet. Seems to work nicely 🎉

I've been adding a basic #Kubernetes API discovery mode to my long-lived Sidecar service discovery system. This will enable us at to span the Sidecar cluster across Kubernetes and #Mesos, and allow us to migrate services one at a time.