Karl Matthias
Short-form technical ramblings. I'm VP of Architecture at Community.com and co-author of "Docker: Up and Running" from O'Reilly Media.

I'm making note of this "Container permission denied: How to diagnose this error" article, as I'm sure I'll run into this more, and I'm really wishing that Linux reliably reported somewhere when EPERM happens (see the author's linked FriendlyEPERM feature proposal from a decade ago): https://www.redhat.com/sysadmin/container-permission-denied-errors

And if you wonder why I care about this, see: https://blog.gregor.com/designing-for-failure-88be805de1ac

There is obviously some kind of security contest going on right now to open PRs to fix security issues, because I'm getting very unhelpful, clearly automated, PRs opened on some of my projects.

Well, JFrog apparently shut down their free tier. It was pretty crummy, treated by them as a demo and not a real free tier. But I had some images hosted there and have now moved them. The lack of notice and the way they responded to my earlier feedback about the service means that I will not be using them for anything any time soon.

Watching the spread of Rust is interesting. I see a pretty strong case for it to replace C in the places where C is the best fit historically, but from personal experience I don't find Rust to be great for a lot of the higher level applications where people seem to be using it. I vastly prefer GC and a runtime for higher level stuff. Simpler to write and run, and less jousting with the compiler.

:wq to Bram Moolenaar, the Dutch creator of Vim: https://groups.google.com/g/vim_announce/c/tWahca9zkt4

Thank you for all your #opensource contributions!

Some good discussion followed my last post. Several people cited experience that they have not seen the issues I described. My experience tells me that it's a real issue.

It turns out that Google also released a paper about it when they added the CFS implementation to support CPU throttling

I was recently reading this post telling everyone (strenuously) to turn off CPU limits in #Kubernetes. I could not disagree more, for most production environments.

There are a few caveats where I do think it makes sense. If you have all your teams:

  • That are extremely aware of the performance characteristics of all of their services.
  • Have benchmarks that will show a CPU performance degradation before it ships.
  • Have monitoring and alerting when their services are regularly running over their requested CPU, and have the processes in place to take action on that.

I doubt that most people are in that situation. Maybe in a large company where a team has 1-2 services to manage. 

The problem for 95% of everyone else is that allowing services to use "free CPU" means you don't really have any forcing function when a change hogs up a bunch of CPU. You also end up with Heisenbugs that only happen when a certain set of services happen to be co-deployed on the same nodes and a certain situation occurs.

27 years in the industry–many of them in ops–tells me that the money saved from widely over-subscribing CPUs is not worth the developer and ops time required to debug and support these things. And most organizations don't have the built-in maturity to have that make sense. CPU is expensive. But dev time is much more so.

So, "for the love of God" as they say in the post, please use both requests and limits unless you can check off all those points above.

Needed to write some tooling to deal with a big pile of YAML. I still reach for #Ruby when stuff like this comes along. This is one of the things that my other regular languages (Go, Elixir, C, Crystal) don't do as easily.

I feel pretty strongly that these new Go built-ins in 1.21 should not have been built in. Seems actually surprising to me that it is coming from the Go team, as well.

I still have mixed feelings about Protobuf. Picking it to back our event bus and event store was still the right choice and I think it's still the thing I would pick again. But I still don't love it. The equivalence of zero value and null is a PITA that has caused a number of bugs. Working around this requires loading a lot of Google schemas and making everything structs. I know this and we do this. But it doesn't stop occasional bugs. And, it just feels like a big kludge to not have that be inherent in the encoding.