Karl Matthias
Short-form technical ramblings. I'm VP of Architecture at Community.com and co-author of "Docker: Up and Running" from O'Reilly Media.

Whatever anyone says about impact on performance, running Go pprof in your production app is a huge source of wins. I have over time made many corrections to CPU and memory performance because of real data from profiling in production. These have hugely outweighed the perf impact of ongoing profiling.

one thing ktistec related that i haven't had the time for is working on build and deployment tools. there are a bunch of outstanding requests—and a few PRs—for docker builds, packaged deployments for various hosting environments, etc.

if you're interested in contributing, let me know. you only have to agree to maintain them—i won't be able to.


We're finally moving off of AmazonMQ with its high cost and terrible performance, onto self-hosted #RabbitMQ on Kubernetes.

1. MUCH faster, support for quorum queues
2. Less than 1/5 the cost
3. Can run latest RabbitMQ
4. Better I/O, network
5. Can tune it however we need to for our use case

Goodbye expensive, slow, terrible AmazonMQ! Major props to team members Dan Pilch and João Britto for making it happen!

I'm making note of this "Container permission denied: How to diagnose this error" article, as I'm sure I'll run into this more, and I'm really wishing that Linux reliably reported somewhere when EPERM happens (see the author's linked FriendlyEPERM feature proposal from a decade ago): https://www.redhat.com/sysadmin/container-permission-denied-errors

And if you wonder why I care about this, see: https://blog.gregor.com/designing-for-failure-88be805de1ac

There is obviously some kind of security contest going on right now to open PRs to fix security issues, because I'm getting very unhelpful, clearly automated, PRs opened on some of my projects.

Well, JFrog apparently shut down their free tier. It was pretty crummy, treated by them as a demo and not a real free tier. But I had some images hosted there and have now moved them. The lack of notice and the way they responded to my earlier feedback about the service means that I will not be using them for anything any time soon.

Watching the spread of Rust is interesting. I see a pretty strong case for it to replace C in the places where C is the best fit historically, but from personal experience I don't find Rust to be great for a lot of the higher level applications where people seem to be using it. I vastly prefer GC and a runtime for higher level stuff. Simpler to write and run, and less jousting with the compiler.

:wq to Bram Moolenaar, the Dutch creator of Vim: https://groups.google.com/g/vim_announce/c/tWahca9zkt4

Thank you for all your #opensource contributions!

Some good discussion followed my last post. Several people cited experience that they have not seen the issues I described. My experience tells me that it's a real issue.

It turns out that Google also released a paper about it when they added the CFS implementation to support CPU throttling

I was recently reading this post telling everyone (strenuously) to turn off CPU limits in #Kubernetes. I could not disagree more, for most production environments.

There are a few caveats where I do think it makes sense. If you have all your teams:

  • That are extremely aware of the performance characteristics of all of their services.
  • Have benchmarks that will show a CPU performance degradation before it ships.
  • Have monitoring and alerting when their services are regularly running over their requested CPU, and have the processes in place to take action on that.

I doubt that most people are in that situation. Maybe in a large company where a team has 1-2 services to manage. 

The problem for 95% of everyone else is that allowing services to use "free CPU" means you don't really have any forcing function when a change hogs up a bunch of CPU. You also end up with Heisenbugs that only happen when a certain set of services happen to be co-deployed on the same nodes and a certain situation occurs.

27 years in the industry–many of them in ops–tells me that the money saved from widely over-subscribing CPUs is not worth the developer and ops time required to debug and support these things. And most organizations don't have the built-in maturity to have that make sense. CPU is expensive. But dev time is much more so.

So, "for the love of God" as they say in the post, please use both requests and limits unless you can check off all those points above.