DevOps: A Survival Guide for Infrastructure Teams
Learn how infrastructure teams can start taking part of DevOps efforts within their organization.
A TSA's take beyond tools and buzzwords
DevOps means something different to different people. It is one of those words that gets pulled in many directions by everyone trying to sell something — sometimes to such an extreme that it becomes meaningless because it has been stretched to encompass everything.
To avoid the buzzword trap and start addressing how infrastructure teams can be part of DevOps efforts in their organizations, let me explain my take beyond tools and buzzwords.
We can think of DevOps as the evolution of the revolution that started when our own processes became the main obstacle affecting our ability to deliver results and respond quickly to the changes in the market.
What is DevOps?
In this Tec17 podcast, hear WWT's Jason Guibert and Paul Richards discuss what DevOps is, its many benefits, how to avoid some common pitfalls and how WWT works with customers to implement DevOps processes into their infrastructure.
It's a culture
In its purest form, DevOps is an outcome driven culture focused on the agility to provide fast results while retaining quality.
But where is technology in this definition, you ask? Where are the developers? The operations team? The infrastructure team?
WWT doesn't include specific tools or individual groups in our definition because DevOps is a culture — that is, it's made up of the characteristics and knowledge acquired by a group of people over generations, through individual and group endeavoring.
So what’s with all the other definitions of DevOps floating around? Maybe you've heard that DevOps means “developers and operations groups working together” or “developers with operations accountability;” or it means “infrastructure as code,” “using automation for repetitive tasks,” or “abstraction of the infrastructure.” You’ve probably even heard that DevOps means “using Kanban or SCRUM methodologies” or “using microservices architectures.”
Are those definitions of DevOps wrong? No, each one is correct for a different group.
Think about culture from a societal perspective, where environments, experiences and needs can differ by region. Consider New Yorkers and Texans. When a New York commutes to work, their either walk, take a taxi or the subway and say, “I’ll be there in 30 minutes.” Meanwhile, Texans likely drive their own truck and measure their commute in hours. Both groups, though part of the same American culture, have adapted the common process of commuting to meet their own realities.
The same happens with DevOps inside an organization. The specific definition DevOps and corresponding care-abouts may differ based on the realities and needs of each specific group.
When a traditional organization adopts DevOps agile methodologies, two main groups — development and infrastructure teams, who normally don’t work together — must collaborate to achieve an outcome. While adopting DevOps methodologies will drive different efforts for each group, they’re working toward a shared goal in the end.
An agile foundation
The key methodologies that drive the agile foundation of a DevOps culture include open and direct communication, collaboration among and within teams and automation.
As part of an organization's evolution toward a DevOps culture, a new group eventually emerges as the translation bridge or liaison between developer and infrastructure teams. This group consists of individuals from each team who understand enough of the other side to be able to map requirements into common services platforms.
To illustrate this point, consider the questions that would arise in scenario where the development group asks for Docker or microservices support. What does this ask mean for the storage team? How can storage admins support it? What does that mean for the networking team? How can network admins support it?
Now consider the questions that arise in a reverse scenario where the IT team has to enforce compliance across the organization. How is data-at-rest encryption enforced in Docker? How are backups managed in microservices architectures? How are the organization’s investments in infrastructure relevant?
These kinds of questions don’t have a straight answer and require a fair understanding of both viewpoints to be able to identify a balanced solution.
Over time, as DevOps expertise builds in the organization, members of both groups will converge around a common language that allows for easier communication and faster response from the infrastructure team enabling new services and faster adoption of services from the development teams.
But this doesn't happen overnight. So where do we start?
Infrastructure teams: Enablers of service platforms
As organizations adopt DevOps visions, the role of the infrastructure team is to become a service provider for the internal teams. While the infrastructure team must deliver the service , they must steer clear of the consumers of the service at the same time. In other words, the infrastructure team must be invisible to development teams.
There is no magic wand to make this happen. There is no single step or process — it's a journey.
As you can see in this diagram, the DevOps journey starts with the adoption of agile IT processes and methodologies. Once agile IT is combined with collaboration, you can progress into more DevOps-oriented phases.
Let’s take a look at several scenarios to see how infrastructure teams can work towards achieving DevOps adoption.
Scenario 1: My org doesn't have a DevOps strategy, but we want to prepare
Under this scenario, you should start with the adoption of agile IT. While the agile methodologies and principles that form the Agile Manifesto used by developers do not have a direct translation into IT, they do provide infrastructure teams with a good idea of what to expect.
For example, if we rephrase the first two principles of the Agile Manifesto, we get:
- Our highest priority is to satisfy the customer through early and continuous delivery of valuable services.
- Welcome changing requirements — agile processes harness change for the customer's competitive advantage.
That is what is expected from the infrastructure team. How do we get there? It helps to think of the developers as your customers.
- Automation: Start working with the automation of repetitive tasks. When defining automation tasks, follow best practices for configuration and hardening of the specific task. For example, if you're in storage, automate the provisioning of storage and the definition of access rules. If you're in networking, automate the provisioning of VLAN, ports or BGP sessions following best practices.
- Infrastructure as Code: Start working towards achieving Infrastructure as Code (IaC). Orchestrate the automation tasks into workflows that deliver consumable resources (compute, storage, network) with consistent and predictable results. Note: this is not only about virtual environments; it includes both physical and virtual resources.
- Software Defined: Adopt software-defined everything (SDx). A software-defined data center provides your organization with the ability to be agile and to adapt to the ever-changing requirements.
- Enable abstraction of the infrastructures. Enable APIs, especially integration with RESTful API interfaces. Think of APIs as the hooks or venues to provide on-demand consumable resources. Platforms and tools higher in the stack consume APIs. If you have traditional enterprise equipment, this is likely already supported. If not, contact your hardware or software provider, as most OEMs now provide APIs for integrations and extensibility. WARNING: Don’t forget to secure access to your APIs!
- Self Service: Enable the ability to consume infrastructure resources over self-service portals or service-catalogs. This goes back to “getting out of the way.”
Scenario 2: Developers went rogue using modern techniques to simplify their work without telling infrastructure teams
This might be one of the more complex scenarios, where the infrastructure team finds itself playing catchup with the widespread use of technologies that “don’t need the infrastructure team.”
This scenario can start when a development team lack a viable solution in-house and finds itself restrained from going to a cloud provider. The developers think the infrastructure team is too slow (even though many times this is the result of the lack of automation).
Whatever the reason, development teams need something they can control. They can’t wait for the infrastructure teams to install the correct version of the libraries they need, or to provision a VM for them. How do they work around this? They ask for a couple of large VMs (multiple vCPUs with a lot of storage and memory) and then they stop asking for additional resources. After some time, they come back and ask for another large VM and then disappear again. If you're seeing this, you are probably living this scenario.
When this happens, infrastructure teams start wondering how the dev teams are creating all these new apps and services. The development teams tell them everything is running in-house in our environment, but the infrastructure teams don’t see requests for the provisioning of the VMs or resources. That’s when we learn they're using those VMs to run containers or microservice architecture platforms.
What are the risks with this approach?
To understand the risks, we need to understand some basic concepts around containers and container orchestration platforms. There are many container options, but I’m going to limit the description to the Docker containers.
A Docker container is an object composed of multiple layers. All but the topmost layer are read-only, immutable layers. This top layer is where the developer’s specific code lives.
Note: a user view can be found here.
Going back to the risks. Think for a minute. If infrastructure runs a vulnerability scan at the VM level and finds dozens of running containers, the scan may not uncover vulnerabilities that exist at the containers level.
Let’s say developers deployed a container some time ago and haven’t modified the application for a while, so the container has not been updated. Now, if one of those layers happens to have a vulnerability (e.g., libssl), a scan of the VM will not necessarily uncover it. And even if it does, what’s the process to patch it?
Are developers responsible for proactive patching of the containers after delivering an app? What about when a new vulnerability is discovered? Should the security team keep track and patch the physical servers and VMs? Who is responsible for tracking and patching those layers and rebuilding the containers?
These are just some of the risks for containers. Sometimes you’ll find that development teams have done their own deployment of a microservices architecture framework, like Kubernetes, on top of those large VMs they requested a year ago. Perhaps they've created microservices-oriented applications that now run in those VMs. A great feature of the microservices framework is the ability to protect the microservice and spun replicas of the containers automatically or on-demand; and protect those from a VM or host failure.
What are the risks with these frameworks? Besides having the same sort of risks and challenges as containers, in a microservices framework the “application” is broken into multiple microservices distributed among those VMs. If they are running in VMs, you’re probably doing backups of those. Guess what? These frameworks, especially when using them from the upstream projects, are not designed to be backed up. Any attempt to restore them will probably fail.
These highly distributed solutions use the concept of ephemeral and persistent storage. They assume the user protects the important data in persistent storage and everything else is assumed ephemeral. Now, if the developers used simple VMs without integration with the infrastructure, there is no persistent storage to read and recover the data from.
How can infrastructure teams take control and support this environment?
If the development team is our customer, we need to uncover the experience they expect from us. This is not an easy task. In many cases, developers may expect an experience with many similarities to a cloud experience. This is often summarized as “give me the resources but stay out of the way.”
Let’s run with that expectation. How can infrastructure teams provide the right resources but stay out of the way? Here are a few steps to consider:
- Create service catalogs and expose all infrastructure resources as a self-service consumable item.
- Identify organizations policies that impact tools and frameworks used by developers.
- Identify the enterprise products supporting those frameworks:
- Map organization’s policies to features in these products:
- Does the framework support data-at-rest encryption? Is it needed by the organization?
- Does the framework support a way to track vulnerabilities and remediate container images?
- Does the framework integrate with my existing physical and virtual infrastructure?
- Does the framework support multi-data center and multicloud deployments?
- Which components do we protect or backup in these frameworks?
- Identify and define integration points between the infrastructure and the microservices frameworks.
- Enable infrastructure automation and orchestration. Make it easier for it to be consumed by the microservices frameworks.
- Setup a new set of requirements for any new infrastructure to provide APIs and capabilities that can easily integrate with the microservices frameworks.
Scenario 3: The organization has a DevOps strategy but went to the cloud
This scenario is a combination of the previous two. From the perspective of infrastructure teams, there is no DevOps, so they can start preparing infrastructure for it. At the same time, with cloud-based solutions already widespread throughout the organization, they're late to the game.
Where do we start?
- Track the tools and frameworks. If they are using the upstream versions, or don't have tools for governance and lifecycle management, this scenario is a version of scenario #2 above. The enterprise products supporting microservices frameworks can run in cloud and on-premise. These frameworks tend to have a strong hybrid cloud capability. Consider preparing your infrastructure to work in the hybrid cloud configuration with the microservices frameworks. A well-designed implementation of a mature microservices framework allows for the easy transition of workloads between on-premise and public cloud.
- Create a cloud experience for your organization. Infrastructure teams need to operate as a service brokerage for the organization. It's not about buying multiple clusters of an OEM solution to achieve redundancy and call it "cloud ready." Delivering the cloud experience should contemplate designing for service availability, even during failure of components. Think about storage service in a cloud environment. There are disk failures and even node failures happening behind the scenes, but the service is still there. We must provide that experience in our infrastructure. This is part of what we can achieve by using software-defined storage (SDS), software-defined networking (SDN), or even going with software-defined data center (SDDC) solutions.
- Adopt automation and orchestration for everything in the data center. We must provide the services while staying out of the way.
A bumpy ride
If you find yourself overwhelmed by a torrent of tools, frameworks and platforms, remember that there is no such thing as “buying DevOps.” There are many tools that are used to support agile methodologies and techniques used in DevOps cultures. But this is not about Docker, Jenkins, Kubernetes, OpenStack or any other buzzwords. DevOps is an outcome driven culture. Don't focus on “what to buy,” but rather on enabling and delivering the experience to your internal customers: the developers and applications teams. Enjoy the journey.