COST CHALLENGES WITH CLOUD ADOPTION 

‘The proportion of IT spending shifting to cloud will accelerate post COVID-19 crisis, making up 14.2% of the total global enterprise IT spending market in 2024, up from 9.1% in 2020’ – Gartner

The adoption of cloud services among medium and large enterprises has grown exponentially in the last decade and it continues to be one of the fastest growing cost heads in IT department. Switching over to cloud services addresses many of shortcomings in the traditional on-premise model, such as the ability to ramp up and scale down volumes as per varying business requirements. According to a Gartner study, worldwide public cloud end-user spend was $260 Billion in 2020, and is projected to grow by 18% to $305 Billion by the end of 20211. However, cloud adoption comes with its own challenges: with increased adoption, an increasing amount of corporate money is pushed into cloud, and most businesses have yet to figure out an efficient way to track consumption and control cloud costs.

Traditional IT storage and computing spend used to be a capital expenditure with well defined process controls and budget allocation. Although these large purchase orders of equipment, space, software, etc., might not have been the most optimal investments, they provided enterprises with visibility and control through beforehand budgeting.

With the migration to cloud, the cost transformed into an operating expenditure with expenses being billed at the end of an accounting period. The autoscaling capabilities in cloud, pay-as-you-go pricing model, and the lack of understanding of pricing methods makes it difficult to track cloud usage in real-time. According to Forbes, 30-35% of the cloud spend is wasted2 due reasons such as subscription duplications, uncontrolled expansion, underutilized instances, and other inefficiencies due to lack of monitoring. A typical large enterprise spends more than $6 Million in cloud in a year2; this means that more than $2 Million is wasted due to absence of visibility and monitoring of spends.

There are multiple additional challenges that lead to inflated cloud spend. Common cloud management obstacles that need to be addressed include:

– Lack of transparency: No defined process to breakdown and allocate overall cloud costs to individual projects and functions.

– Complex billing process: Invoices are highly complex and extensive due to all the plethora of cloud configurations created by various departments. A single invoice can span thousands of pages.

– Inefficient practices: Absence of a defined process to spot unused and unattached resources set up temporarily during model development cycle.

– Limited training: Engineers are not trained and incentivized to optimize servers based on computing, graphics, storage capacity and throughput analysis.

– Demand control: Many enterprises fail to take advantage of cheaper rates through upfront instance reservation based on disciplined demand forecasting, leading to increased use of spot instances. As of 2020, only half (~53%) of AWS users are reserving instances, this proportion being even smaller for Azure and Google Cloud1. By underinvesting in demand forecasting capabilities, companies are giving up a significant source of savings.

– Tiered pricing: Business leaders are not cognisant of tiered pricing available in certain cloud offerings such as Amazon S3 that could potentially provide significant volume-based discounts.

As businesses look to onboard more users to the cloud, lack of usage guardrails and resource visibility can exponentially increase cloud costs, offsetting potential savings from increased efficiency. As cloud expenditure becomes a primary part of the overall IT expense, having a well-thought-out cloud cost optimization strategy becomes critical.

FRAMEWORK TO MANAGE COSTS

Contrary to popular belief that cloud adoption lowers costs and increased efficiency, the key to unlocking savings lies in implementing controls. Unplanned and uncontrolled usage of cloud resources often leads to piling up of unused instances that do not add any value but still need to be paid for. In the medium term, these practices lead to a negative ROI from migrating to the cloud. For end-to-end management of cloud costs, Kepler Cannon has a proprietary framework (Exhibit 1) that has been implemented across several enterprises. Cloud costs are always a direct or an indirect function of quantity and price, the two fundamental levers of cost optimization.

Exhibit 1: Cloud Cost Optimisation Framework

– Quantity: First step to managing costs is to reduce the quantity. With on-premise infrastructure, the usage costs or ‘quantity’ costs are usually limited to the electricity consumption by the hardware, hence leaving systems running overnight and other inefficiencies are common and bear no consequence, but with cloud infrastructure, vendors keep track of and charge for each second of usage. Hence, it becomes important to avoid such wastage and cut down consumption to necessary levels.

Quantity management can be carried out by enhancing visibility into usage and spend, setting up management guidelines, optimizing usage by removing inefficiencies and using predictive analytics to forecast demand for the next billing cycle.

– Price: Parallel to managing the quantity, enterprises can also look to optimize pricing. As intuitive as it sounds, this step, which can deliver significant savings, is often ignored by enterprises. Price management could be done at either a vendor level or individual product level. At an aggregate level, this practice includes selecting the right vendor based on needs, exploiting preferred status, negotiating volume-based and sustained use discount. Further savings can be attained at a product level through optimal pricing schemes and attention to tiered pricing methods, among others.

MANAGING QUANTITY

Several initiatives can be designed to optimize the volume of cloud resource consumption. Few of the schemes that worked exceedingly well in multiple institutions are described in this section.

Exhibit 2: Cloud Cost Management Initiatives

EMBRACING DOMAIN CENTRICITY

Transparency

The first step towards cost optimization is to gain visibility into the key cost drivers and attribution to different services and resources. The problem of visibility into cloud spending arises because of its vast scale of use and aggregate, enterprise-level billing. Moreover, businesses using more than one cloud service provider or hybrid cloud models face even greater difficulties in obtaining a transparent spending view due to the lack of consolidated usage reports.

1. Tag Analytics:

Businesses should keep track of overall costs by tagging resources with project IDs, cost centers, etc. enabling breakdown of cloud costs across LoBs, and business functions (analytics, marketing, sales, services, etc.). Cost allocation reporting can be further enhanced with additional features such as instances (on-demand, spot, reserved), payment tenure,  etc. Resource tagging would also help in reporting cloud usage at granular time intervals such as daily or hourly periods and performing heatmap analysis.

Post allocation and cost efficiency analyses are critical for understanding the value of every dollar spent on cloud services and assigning cloud costs to customer types and business initiatives to augment profitability and impact analysis.

2. Executive Dashboards:

Dashboards on expenditure across different services, use cases and changes over time can provide insights to leadership and help uncover recurring patterns, spikes, etc., and help predict future demand. Transparency also helps to highlight cost anomalies (spikes on weekends / holidays), seasonality, and perform root cause analysis. Moreover, cloud vendors have also launched visualization tools such as AWS cost explorer, usage report, Azure and GCP cost management, etc.  to enhance cost transparency.

3. Live Monitoring:

As an advanced attribution capability, businesses can design real-time visualization tools for daily administrative staff to monitor utilization, track cloud resource inventories and send out automated alerts on spikes in resource counts as compared to pre-defined thresholds. This provides real time comparison of accrued costs against allocated budgets and escalation to concerned authority in case of anomalies.

Management

After gaining sufficient visibility into the spend, the next step for businesses is to set up a cloud resource management process encompassing robust budgeting, control measures, and design guardrails for users of cloud resources.

4. Training & Incentivization:

The root of most inefficiencies in cloud usage arises at the user level itself. Therefore, one of the most effective ways to cut down on costs is to train the engineering staff for efficient usage. Raising consciousness among employees on inefficiencies in cloud usage and corresponding financial impacts to the firm, and socializing best practices on proper utilization of resources, identification of idle instances and server optimization will provide long standing benefits for all enterprises. Moreover, businesses can incentivize their employees by designing recognition schemes for best performing teams on optimal resource usage over a sustained time period and raising awareness amongst employees by coaching through mistakes and inefficient practices.

5. Process Standardization:

Based on historical data, administration can set up resource usage standards, ownership hierarchy and escalation process at the start of new projects. For ongoing projects, monitoring of resource usage can create a standard frame of reference for benchmarking utilization. Businesses can also set up a monthly monitoring process to investigate running inefficiencies, underutilized resources, and evaluate different initiatives based on pre-defined metrics such as cost, speed and quality.

EMBRACING DOMAIN CENTRICITY

The implementation of a fully domain-centric framework can seem daunting. Given the need for an exhaustive design, which requires considerable analysis time and human capital, it is useful to divide the rollout into three phases: Short-term remediation, mid-term façade construction, and long-term evolution to the target state (Exhibit 3).

1. Remediation

Before making any structural changes, an “as-is” mapping should be undertaken to capture data and feeds for critical business processes (e.g., distribution, finance, procure-to-pay). The current state of data flows can then be mapped to business processes and subsequently analyzed to identify critical operational issues (e.g., circular references, dependencies, feed types). This process paints a picture of how data interacts internally and externally; laying out this view is an important first step toward preparing the organization for full domain centricity.

A clearly articulated and exhaustive data governance framework is the foundation of an ideal target state, but such a framework is more a means to an end than the goal itself. An ideal target state from a data governance perspective is one that has:

– Decommissioned all non-essential data feeds and data objects

– Migrated data from on-prem to public, cloud-native enabled DBs

– Upheld single source of truth (SSOT) for all internal data feeds

– Leveraged management tools for data cataloging (e.g., Alation)

Exhibit 3: Transformation Toward Domain Centricity

2. Façade scaling

In the medium-term, the move toward domain centricity can begin by way of incremental encapsulation. New consumers or producers that come with, for instance, the addition of new applications, provide an opportunity to scale-up façades: Rather than processing data through traditional, less sophisticated means like table-to-table replicas, organizations may pivot to APIs to promote efficient and sustainable data processing. Pursuant to a system change, a façade (i.e., an API endpoint) must, at a minimum, be introduced, even if data persistence cannot be abstracted from the main producer or consumer (e.g., a custom legacy design for the main application that ingests the data). For API tools, two central paths may be pursued: Using tools like MuleSoft to build APIs (i.e., engineering / development), or tools like Mashery for managing APIs (e.g., logging, monitoring, alerts, IAM).

3. Target state actualizing

Finally, in the long-term journey toward the target state, using next-generation data architecture patterns, principles, and technologies can foster implementation of transparent data services that do not rely on specific databases themselves. In the same vein, transparent interactions between simple primary and complex composite entities can be facilitated with microservice-based architecture. In short, domain-centric data models reach full maturity as they become interoperable and secure.

THE SHIFT TOWARD DOMAIN CENTRICITY

Most of today’s data platforms are monolithic, highly-coupled, and operated in silos. In this whitepaper, we have introduced the building components of a ubiquitous, distributed, and domain-centric data architecture, with a focus on data products that are geared toward domains. These products are operated by integrated, cross-functional teams, comprising of data engineers and product owners, and use a common infrastructure platform. This platform facilitates the hosting, preparation, and publishing of data assets. A domain-centric approach requires a distributed architecture and data model that are based on open standards and governance to ensure harmonized operations and self-service (Exhibit 4).

In the proposed model, database technologies will be purpose specific and available to domains in a service-like fashion to address common data use cases, like highly structured, relational databases (e.g., PostgreSQL), semi-structured setups (e.g., MongoDB), and specialized data structures, such as HyperLogLog (e.g., Redis).

Just as data services are abstracted away from users, infrastructure is abstracted from data services. Interoperability is guaranteed via open-source standards enabling faster technology adoption. This approach paves the way for new technology, like the enablement of high-performance compute for analytical (e.g., Nvidia Ampere), and non-volatile storage for transactional workloads (e.g., Intel Optane).

Exhibit 4: Domain-Centric Architecture

IN CLOSING…

Developing a clearly-defined vision for a domain-centric data architecture is necessary for enterprises to minimize complexity. However, while the technology is ready, enterprise-scale adoption of domain-centric design has yet to materialize. A primary shift will require focus on domain-driven data products, as the underlying infrastructure and tooling are merely implementation details. The same applies for any central data platforms facilitating reporting and visualization tasks: They are simply another data consumer.

Empowering visionary data architects that are comfortable departing from current-state centralized, monolithic design principles will therefore also be key. The paradigm shift described here will also require a new set of governing tenets focused on:

– Serving rather than ingesting data

– Discovering and consuming vs. extracting and replicating

– Publishing and subscribing to events via enterprise message bus

– Creating a uniform/homogenous and distributed
ecosystem vs. multiple centralized platforms

The need for better data architecture is real and pressing, and the tools and technology are ready. It is up to technology leaders to acknowledge that maintaining the status will result in a repetition of past failures, despite using cloud-based tooling and platforms.

References:

  1. “IDC’s Global DataSphere Forecast Shows Continued Steady Growth in the Creation and Consumption of Data.” IDC, May 2020.
  2. “Information Governance Strategies For Lifecycle Control: Structured Data Archiving.” Iron Mountain.
  3. Ghosh, Paramita (Guha). “Data Governance in the Cloud.” DATAVERSITY, 31 Aug. 2018.
  4. The Total Economic Impact™ of Snowflake Data-Warehouse-as-a-Service.” A Forrester Total Economic Impact, June 2018.

Read More