Careless capacity management in the cloud can get you into trouble because storage can grow almost invisibly over time. If you wait until the cloud is almost at capacity, two things can happen: 1) hardware lead times can cause you to hit capacity limits before the new hardware arrives, and 2) a sudden spike in instance usage can cause you to hit capacity before you had originally planned. (This can be compounded by #1 if you're not careful.)
After gathering a lot of data from multiple regions of a very large cloud over the course of a year, I was able to see growth patterns over time. These patterns enabled me to determine upper limit percentages (utilization) from which to trigger the hardware ordering process. Couple this with a solid understanding of the lead times from your hardware vendor, and you should be able to keep the cloud from hitting maximum capacity before new hardware is racked, stacked, and assimilated. Working with standard hardware configurations also makes this process faster and easier. You must also keep in mind what “size” of hardware to buy so as to scale out rapidly without causing problems due to a single hardware node failure.
During this session, I’ll discuss:
- Best practices to chose a standard hardware configuration that is rapidly scalable and will not cause severe ‘boot storm’ issues if a failure happens
- How to plan for initial server counts for a new cloud or region and set apredictable growth trend for usage in that cloud
- Recognizing when to trigger the hardware procurement process so that it aligns with projected growth and known vendor lead times
After attending this session you’ll be able to employ capacity management best practices to ensure maximum service uptimes, minimum service disruptions, and ease the transition of capacity expansion.