Many organizations are moving towards a DevOps way of working and transforming their application teams into DevOps teams. In this article, we will discuss how you can organize the Cloud products and platforms team to support and empower the application teams in their DevOps transformation journey. We will also discuss the different aspects of the Cloud Operating Model (Architecture, Organization, Way of Working, Automation, Security and Compliance, and Service Management) that enables the Cloud organization to better serve the application teams while transforming itself into a DevOps way of working.
DevOps has revolutionized how we build and deliver software products by bringing Developers and Operations into a single frictionless value stream. Today, many organizations are moving towards a fully DevOps way of working after realizing the many benefits of DevOps in enabling businesses to respond faster and thrive in the ever-increasing uncertainty and complexity of digital business models. In the era of innovation as competitive advantage, DevOps helps to reduce the lead time to deliver software, ensuring secure and reliable releases and enabling experimentation, hypothesis-driven development, AB testing, etc.
While most of the DevOps transformations are focused on the application teams that build business functionality, there is not a lot of emphasis on cloud services and platform-side in the IT organization. In many cases, this results in the cloud organization being left behind, while they try to keep up with the speed of the many DevOps teams. This will become a bottleneck in the value stream, hindering the agility of the application teams in delivering their business functionality.
In this article, we will discuss how you can streamline the work between the application teams and the cloud platforms organization, as well as how to remove friction and dependencies. In other words, we will focus on the cloud organization side of the DevOps transformation. We argue that the cloud organization should also use a DevOps way of working to avoid becoming the bottleneck in the end-to-end value stream. We will therefore also discuss what the Cloud Operating Model should look like in a modern DevOps setting.
What is the mission of the cloud organization?
The mission of the cloud organization is to empower the application team to build and operate their applications by providing secure, compliant, standard and flexible cloud platforms and services with minimum dependencies by leveraging self-service and Infrastructure as Code (IaC).
Figure 1. The Cloud Tribe offers central platforms and services to the different application teams. [Klik op de afbeelding voor een grotere afbeelding]
The cloud tribe (Figure 1) offers central platforms and services to the application teams which they can use to enable their application development processes and become more productive. The platforms and services catalogue can include, for example, different environments (Dev, Test, Prod), Continuous Delivery Pipelines, monitoring and dashboards, testing tools and services, standards operating systems images, containers, etc.
By providing this platform and tooling centrally as shared services to the application team, the application team can focus their effort on building functionality for their (end) customers to add value to the business as opposed to building their infrastructure that is needed to enable the business functionality. This will speed up the delivery of the features to the business.
What does the Cloud Operating Model look like?
An operating model (Figure 2) represents how an organization actually runs itself and delivers value to customers. In our case, the Cloud Operating Model consists of the following aspects:
Figure 2. The Cloud Operating Model. [Klik op de afbeelding voor een grotere afbeelding]
The cloud team is organized in multidisciplinary teams around each product or service. This helps to minimize dependecies and hand-offs across the value stream as well as creating ownership and shared goals among everyone in the team. In order to share knowledge between different squads, chapters are formed horizontally to bring people working with similar capabilities together to share expertise.
As there are several cloud services, platform and products, we could organize the cloud organization in a Spotify-like mode. This means that the cloud organization is organized in a tribe and consists of several smaller teams (aka ‘squads’). Each squad has the responsibility to deliver an end-to-end cloud service or product, such as Continuous Deliver squad, Infrastructure Services squad, Connectivity squads, Monitoring squad, etc. In order to share knowledge between different squads, chapters are formed horizontally to bring people working with similar capabilities together to share knowledge.
The cloud tribe has an overarching mission, vision and strategy that is further divided into several objectives and key results to be achieved by the different squads. Each squad is considered an autonomous product team that consists of multidisciplinary and cross-functional roles which is responsible for running the end-to-end cloud products and services.
Organizing a centrally managed cloud tribe with a portfolio of cloud services and products gives the application teams the flexibility and autonomy to use the technology stack that they find to be the most suitable to achieve their objectives. In addition, it empowers and enables the application teams to self-service their needs so they can build, deploy and operate their applications with limited dependencies.
As each squad in the cloud tribe is a product team, the product owner role is essential in defining the vision, strategy and managing the priority of the product. The product owner also works with the development team in prioritizing the backlog and interfaces with other cloud products and services.
Another important role is the Automation Engineer, who preferably has a software engineering background, as the cloud squad will use Infrastructure as Code to deploy and manage their products. Additionally, each squad will have their own scrum master to facilitate the agile process used by the squat itself. The rest of the roles are based on the product that the squad is responsible for, a big data engineer for the data squad, for example.
At the tribe level, it is important to have a chief product owner to steer the overarching vision and strategy of the tribe as a whole and implement the required strategy. Additionally, we suggest including an agile coach to the cloud tribe. The agile coach will be mainly focused on coaching the squads on agile processes and help resolve the impediments they face.
Figure 3. The Cloud Tribe Organization. [Klik op de afbeelding voor een grotere afbeelding]
Way of working
The cloud tribe uses the DevOps way of working to manage the development of its own product and services. This includes enabling fast flow of value from development to operations by making the work visible (Kanban), reducing batch sizes, building quality at the source, limiting work in progress, enabling pull rather than push, and fast and short feedback loops. In addition, providing room for learning and continuous improvement and encouraging collaboration and trust will help implement a DevOps culture. This could be achieved, for example, by conducting blameless postmortem evaluations and sharing these learnings with the rest of the organization.
The infrastructure and cloud services offered by the cloud tribe must be able to be built, tested, and deployed in a fully automated way, from source code, scripts, and libraries that are stored version control repositories. This is called Infrastructure as Code (IaC). In the IaC, the complete infrastructure specification, such as templates as well as configurations, are kept in machine-readable form in version control. By doing so, it is possible to completely recreate the state of production purely from code in version control rather than manual configurations, which automates the disaster recovery process. Additionally, the application teams can use the platform and infrastructure templates via APIs and further automate their deployments. The APIs can be used by the application teams to self-service their environment creations, applications deployments, as well as other services and platforms such as database instances. Automation helps to reduce errors and eliminate repetitive work while improving the speed and enabling the continuous deployment of code.
Security and compliance
Security is at the heart of cloud services and products. The cloud squads use DevOps security practices by integrating security and compliance objectives into all stages of the development and operations processes rather than performing security and compliance activities only at the end of the project. In this way, the cloud squad ensures that their cloud services and products are secured and compliant. Additionally, by using IaC and Continuous Integration/Continuous Delivery (CI/CD) pipelines to deploy cloud products and services, the team has access to the complete history of infrastructure changes through the version control logs, facilitating the auditing and compliance processes.
On the other hand, the cloud tribe embeds security and compliance requirements in their cloud product and services that they offer to the application teams. For example, by providing secure base images for VMs, and containers which are compliant with the enterprise security baselines. Also, by integrating security and compliance requirements in the CI/CD pipelines they offer to the application teams. Additionally, enforcing security policies via code, e.g. using Azure or AWS policies to enforce certain settings on the products that the cloud tribe offers. The cloud tribe therefore ensures that security is standardized across all application teams (see further [Spre20]).
IT Service Management
The cloud tribe uses pragmatic and lightweight IT Service Management (ITSM) processes, such as change management, incident management, configuration management and release management. This is accomplished by leveraging process automation and IaC, which enables a fully automated and auditable approach to ITSM processes as well as CI/CD pipelines, and cloud monitoring and alerting tools.
The cloud tribe interfaces with the outside world by providing a self-service platform for the application teams rather than creating tickets to provision a certain infrastructure product. In other words, the cloud tribe designs an API-driven infrastructure as a set of services, and provides the application teams with a self-service catalog (VMs, gateways, routing tables, firewalls, storage, messaging, databases, etc.)
The cloud tribe aims to offer a lightweight standardization of its portfolio of products and services. This can include standardized and secured OS base images for VM or containers, as well as standard build and deployment pipelines which can be customized by each team to fit its own needs. In addition, the cloud tribe offers standardized monitoring and telemetry tooling for all applications teams. The cloud tribe aims to standardize as much as possible to help reduce maintenance costs, mitigate security risks and enable collaboration between teams.
When moving to the cloud, the success of the cloud transformation is based mainly on the maturity of the cloud organization and the ability to set up the right operating model as described above. Starting a big change across the organization can be challenging. The best way to start such a transformation is to start small. We advise to pick a pilot team and organize it around certain cloud products and then offer these products to the application teams using the new way of working. By doing so, the change can be cascaded, and the wins and learnings can be replicated across the organization to inspire the rest of the teams.
Each organization is unique and has to tailor their operating model in a way that fits their ambition, needs and maturity. While implementation may differ, DevOps practices and principles remain the same. In the end, it is a learning journey. So make sure it is enjoyable for everyone involved in the transformation.
[Kim16] Kim, G., Debois, P., Willis, J. & Humble, J. (2016). The DevOps Handbook: How to Create World-Class Agility, Reliability, and Security in Technology Organizations. Portland: IT Revolution Press.
[Spre20] Sprengers, M. & van der Houven, W. (2020). Security principles for DevOps and cloud. Compact 2020/1. Retrieved from: https://www.compact.nl/articles/security-principles-for-devops-and-cloud/