Kubernetes Sizing: T-Shirt Method
Kubernetes Sizing: T-Shirt Method
If you've chatted with me for long enough, you'll know that I'm crazy about containers and I absolutely adore my Kubernetes clusters. There's so much to love about them, except for the costs. It's very easy to let costs spiral out of control when using clusters. I’ve been down that road before and now I take a practical approach to correctly sizing any workloads that I deploy to my clusters.
My secret weapon is the t-shirt method 😜. I have to admit, I didn't come up with this idea on my own. I actually borrowed it from a DevOps/Systems Engineer who used to work at Airbnb.
The t-shirt method is quite simple. It involves treating your workloads like clothes. Just like you wouldn't want to wear a shirt that is two or three times too large, your workloads shouldn't be over-provisioned either.
To get started, I monitor each application and determine its actual resource consumption, similar to trying on a shirt. I never take the size on the tag at face value, since different applications may behave differently under load. I look for resource usage peaks and plan my nodes around those peaks.
Once I've found the peaks, the real work begins. I then determine how many pods can run on one node under load. I prefer to deploy the same type of application to the same node pool, which makes my life easier. To do this, I leverage node taints so that only certain applications can be scheduled on a particular node.
Here’s some very helpful links on how to add taints, tolerations, and node affinity. They will both work. 👇🏼
By following this technique, you can pack multiple workloads into one node, increasing utilization to 85-90%. The biggest waste in managing clusters at scale is having large nodes with low utilization rates (I've seen utilization rates as low as 5%).
I do my best to try to illustrate the t-shirt method below. The diagram has four nodes. If we look at Node A, we can see that it does not follow the t-shirt method, as it has many applications of different sizes. If we try to add another large deployment (the green box), it won't fit, causing Kubernetes to add another node pool. This decreases utilization rates, as now we have two nodes with lower utilization than desired.
On the other hand, if we look at Node B, C, and D, we can see that they are correctly sized for the applications they are running. If we need to add another deployment, it will fit without triggering a scale up, increasing utilization rates.
** Purple Box - Small; ** Blue Box - Medium ; ** Green Box - Large
When managing infrastructure for my team, I always keep the t-shirt method in mind. However, I don't let it become a bottleneck. There are use cases where the exact resources required to run an application are unknown. In general, my recommendation is to build multiple node-pools that increase utilization rates, taking advantage of taints, labels, and node-affinities to segregate workloads into different node pools.
I hope you think about the t-shirt method when needing to decrease Kubernetes costs! 🤑