While cloud offers many benefits to organizations, like more flexibility, it comes with some potential liabilities which can result in cost overruns. Though freed from physical technology limitations, organizations migrating to the cloud without a well-defined plan will often find themselves pulling on resources and unintentionally contributing to overruns. In this blog, we’ll discuss some of the fundamental root causes of cloud cost overruns to help organizations better understand the pitfalls. In our next blog, we’ll provide more insight into how organizations can avoid these common, yet costly, pitfalls by leveraging the management capabilities essential to a corporate technology expense management program.
Understanding the Procurement Paradigm Shift Enabled by Cloud Services
Organizations that are considering cloud delivery platforms often find themselves in one or more of the following four decision paradigms:
- Cloud First – All new projects leverage cloud services unless there are compelling reasons not to.
- Lean Towards Cloud – Considers the best fit for a business application, but applications and infrastructure should become cloud services.
- Lean Against Cloud – Considers the best fit for a business application, but applications and infrastructure should remain on premise.
- Cloud Last – Capabilities that can only be delivered via cloud services should be deployed in cloud.
Unlike traditional technology investments that were once governed and managed by a centralized IT department (in conjunction with Finance and the respective Business Unit), organizations are finding that the procurement of on-demand cloud services are now being facilitated in a decentralized holacratic manner. In this sense, the authority and decision-making are distributed throughout a holarchy of self-organizing teams spread across the business units.
Another paradox is that it is fairly common for business units to be more aggressive in embracing cloud services (i.e. cloud first), whereas the IT and Finance departments retain a more conservative approach (i.e. cloud last). Therefore, IT and Finance often find themselves reacting to unplanned problems that result from a lack of coordinated efforts in deploying cloud services. The diagram below illustrates the comparisons between the traditional structured methods of procuring premise-based technology versus the much higher frequency of decentralized on-demand cloud service activations and changes.
Common Cloud Cost Overrun Root Cause Issues related to Procurement and Deactivations
As cloud services become mainstream, successful IT organizations are quickly transforming into more strategic roles – corporate technology fiduciary agents. Business units that fail to establish a strategic collaboration with the technology experts for procuring cloud and managing cloud services, often find themselves mitigating unplanned costs. Some of the top root causes of cloud cost overruns can be traced back to:
- Forecasting higher infrastructure utilization than achieved – This is often the result of incorrectly or completely failing to baseline current utilization requirements for compute, storage and network services and then erroring cloud service configurations on the side of caution.
- Budget for ideal capacity without enough allowance for uncertainty – This is the exact opposite of over-provisioning and can also be attributed to insufficient analysis of actual historical usage across the appropriate production timelines (i.e. monthly, quarterly, annually).
- Not understanding all the variables and inter-dependencies of services that contribute to costs – As discussed previously, Cloud Service Providers (CSPs) publish service catalogs for discrete cloud services. End users are responsible for architecting and orchestrating the appropriate configurations to support their specific business requirements.
- Under-anticipating post-production ongoing development and test environment needs – IT infrastructure experts understand the need to establish and maintain separate development environments from test and production environments.
- Not accounting for all tangential costs beyond compute and storage (i.e. data transfer, load balancing, IP addresses and application services) – IT architecture SMEs understand how to properly provision and right size all the supporting infrastructure elements across development, test, production and failover environments.
- Resources not being de-provisioned when no longer needed or idled when not in continuous use – Cloud services are easy and fairly quick to activate and change. However, cloud services are procured in a discrete manner and also require de-provisioning of all the inter-related services. Ongoing analysis of actual usage trends is key to temporarily suspending services during off business hours or when the development and/or test environments are not activated.
- Selecting higher cost platform services instead of open source on lower-cost IaaS – IT experts understand the functional differences between brand name cloud services for business applications, database environments and storage versus open source and how to optimize the appropriate compute and storage configurations. They can help business units select the appropriate environment based on the actual business requirements.
Common Cloud Cost Overrun Root Cause Issues related to Production Operations
Cloud Services are a consumption-based service. Therefore, organizations need to be continually monitored and measured in order to take corrective action within a timely manner. Some of the most common mistakes organizations make when they fail to implement appropriate governance and near-time active management oversight are summarized below.
- Running non-production instances 24×7 (“always on”)
- Over-provisioning instances – Sizing to Batch requirements rather than leveraging Spot capacity (i.e. purchasing available marketplace capacity on demand) or Auto Scaling
- Idle Instances that are not integrated with Load Balancers or Auto Scaling Groups
- Selecting higher cost instance types
- Retaining higher cost legacy instance types
- Excessive data egress activities
- Inter-server integrations across Regions rather than within a specific data center
- Failure to clean up old storage
- Examples: unattached storage, old snapshots of storage, lack of or non-enforcement of data retention policy
- Selection of expensive storage options (i.e. Local versus Block Storage and/or solid-state drive (SSD) versus the traditional spinning hard disk drive (HDD))
- Replication of large non-shared data files
- Improper management of discount (i.e. “reserve instance”) programs
- Failure to benchmark costs of services relative to different Regions (e.g. Data Centers) and across CSPs
- Continual service offering changes being published by Cloud Service Providers
- Unable to effectively manage the complexity of the continual price changes in CSP Service Catalog Offerings
- Continual changes in how the enterprise uses cloud services across the myriad of CSPs (multi-cloud)
- Lack of visibility and capabilities to benchmark lower pricing options
- Unable to effectively process Cloud Service Bills which contain millions of usage transactions
- Decentralized, “bottom up” adoption with no collaborative management from a corporate Cloud Center of Excellence (CCoE) SME team
- Inability to take cost reduction actions without appropriate understanding of specific business applications
In the next blog of our cloud expense management series, we will provide insight into how the consumers of cloud services across business units derive value from an effective cloud expense management program. Missed any in the series? Catch up below.
Cloud Services Require New Approaches To Expense Management
Connecting the Dots Between Telecom and Cloud Services Expense Management
Becoming a Strategic Enabler for Cloud Requires IT Leaders To Orchestrate Holistic Near-Time Insight
Managing the Cloud – The Next Evolution in IT Business Service Management