In today’s rapidly evolving and complex cloud computing environment, who is responsible in the case of a service outage?
Cloud computing introduces a paradigm shift in providing and supporting software services. With this shift come new questions of responsibility and accountability that are highlighted particularly when there is a service outage.
Traditionally, applications have been offered by vertically integrated service providers and enterprises. Hardware and software were bundled together and could be purchased from a single supplier, who was accountable if a service was impaired.
The cloud computing environment, however, breaks up this integrated approach. By decoupling the software from the underlying hardware resources, computing resources can be pooled by the cloud service provider to enable greater efficiency, convenience and economy and can be offered to multiple cloud consumers. In the case of an outage of a cloud based service, who is then responsible? The infrastructure provider, the cloud consumer who purchases an infrastructure service, the software provider, one of the network service providers carrying IP traffic to the end user’s device — or even the end user’s equipment?
Accountability has clearly undergone some shifts in the cloud, and standards bodies are working to establish outage measurement rules that address the new paradigm. But the rules are not yet in place. For now, some existing distribution of service accountability can be adapted to this new environment.
But to do so, all parties operating in the cloud need to be aware and aligned on the basic principles that are involved. Then, service level agreements (SLAs) can be established that clarify accountability, as well as the methods for measurement..
To contribute to the clarification process, this article provides a brief overview of cloud business models and roles, along with suggested models for identifying responsibilities and measuring service availability.
The cloud’s key difference
The United States National Institute of Standards and Technology (NIST) formally defines cloud computing as a model for providing ubiquitous, on-demand access to shared, configurable computing resources, such as networks, servers, storage, applications and services. The NIST model offers major advantages. Cloud service providers can manage pooled computing resources. The resources are elastic and can be expanded and reduced on demand or triggered automatically by changes in traffic patterns. Convenient access to resources can be offered from any IP device, at any time. And measured-service capabilities allow simple, usage-based pricing.
To enable these benefits, cloud computing uses virtualization technology to separate application software from the underlying hardware, as shown in Figure 1. This break from traditional service delivery is responsible for some shifts in accountability.
Cloud service models
NIST defines the following three service models, which rely on the cloud’s shared computing resources. As shown in Figure 2, these models logically sit above the IP networking infrastructure used to link end users with applications hosted in the cloud.
- Software as a Service (SaaS) — allows consumers to use the provider’s applications, such as email and customer relationship management (CRM) applications, which run on cloud infrastructure.
- Platform as a Service (PaaS) — offers middleware and operating systems that facilitate application deployment.
- Infrastructure as a Service (IaaS) — provides virtualization hypervisor and hardware, such as compute, memory, storage and networking resources.
Cloud computing opens up interfaces between applications, platforms, infrastructure and network layers. As a result, the layers can be offered by different players, who may be expanding beyond their usual business activities. As Figure 2 shows, in the cloud:
- Suppliers develop the equipment and software used for IP networking and all cloud business models. They also provide integration services.
- IP network providers own and operate the networks and equipment used to deliver a service to end users.
- Cloud service providers own and operate the computing solutions, systems and equipment used to deliver a service to end users.
- Cloud consumers offer specific applications to end users. They pay service providers for whatever cloud resources they consume in this process.
- End users use software applications hosted in the cloud, relying on their own equipment and an IP network for access.
Responsibility for impairments
Service impairments can be the result of vulnerabilities in software, hardware, power, environment, application payload, IP networking, operational policies or service, application and user data, as well as natural disasters and human error. The telecommunications industry has traditionally identified three categories for outage accountability. As described in Figure 3, they include:
- Product-attributable service outages associated with hardware or software
- Customer- or service provider-attributable outages
- External- or force majeure-attributable outages, such as a natural disaster or a malicious act
The nature of the cloud now makes accountability more complex. For example, it is often split between the cloud consumer and the service provider, and many more service providers can be involved in the service delivery. The following list offers a starting point for considering accountability on an element-by-element basis, with additional factors discussed in the Service measurements section that follows.
- Cloud consumer – Cloud consumers are responsible for properly provisioning, configuring and operating their application.
- Virtual appliance suppliers – The suppliers that produce the software-only applications are responsible for assuring that their application software is stable and reliable.
- Infrastructure suppliers – The suppliers of compute, storage, networking and other hardware and platform software operated by the cloud service provider are responsible for assuring their equipment is robust and reliable.
- Cloud service provider – Cloud service providers are responsible for robust and reliable operation of cloud computing infrastructure and facilities, and serving the needs of cloud consumers.
- Network service provider – Network providers are responsible for making sure the network is running robustly and always available.
- End user – Some service impairments can be attributed to the operation, configuration or failure of an end user’s device — or other user equipment.
Of course, specific outage responsibilities vary according to the cloud service model and the contract terms agreed to by the cloud consumer and cloud service provider. In addition, a single organization can be responsible for more than one of the accountability categories. A simplified summary is provided in Figure 4.
For critical enterprise applications, measuring performance is crucial. But the cloud environment is widely dispersed, and each end user may be served by a different combination of resources, including different IP networks and end-user devices. Consequently, one key service measurement challenge is choosing where to collect data in the service delivery path.
Figure 5 shows four natural points for measuring a cloud-hosted application’s performance, whether focusing on availability, reliability, latency or other aspects of service quality. Data from these measurement points can help determine accountability.
Measurement point 1 (MP 1) examines how each key component in the data center affects service availability. To eliminate all impairments not associated with the application, this measurement is taken with minimal IP routing, switching and facility infrastructure between the measurement point and the server hosting the application. Separate MP 1 ratings can be calculated for routers, security appliances, load balancers and other infrastructure configurations. MP 1 does not consider georedundancy.
Measurement Point 2 (MP 2) considers how service availability is affected by the data center environment. That is, it measures the performance of individual application instances, along with the hosting data center. But it does not consider georedundancy.
Measurement Point 3 (MP 3) determines service availability across multiple data centers to mitigate impairment of individual application instances, IP equipment and facilities and data center infrastructure. MP 3 incorporates the service availability benefit of georedundant application instances at multiple cloud data centers.
Measurement Point 4 (MP 4) adds the IP equipment and facilities that enable communications between the user and the cloud data center to evaluate end-to-end service availability. This measurement is strongly dependent on the character of the end-user’s access network.
Establishing clear agreements on accountability
Accountability in the cloud has not yet been clearly defined. At least one standards organization is working to create standards that will establish exactly what each player in the cloud is responsible for. But until these industry guidelines are formally adopted, SLAs or other agreements need to clearly provide all parties involved the information they need to know as to who’s responsible for preventing and remedying outages and how are those outages being identified and measured.
To contact the authors or request additional information, please send an e-mail to firstname.lastname@example.org.