Software, in the words of Marc Andreesen, is in the process of “Eating the world.” It touches every industry, and powers more process than ever before; and the improvements we have made in its development and deployment are a big reason as to why. At a constant rate, new technologies and approaches to build more and robust scalable things emerge, become adopted and are discarded as new strategies take their place. As predicted by Gordon Moore in 1965, computers have continued to double their computing power every year and a half; and since the Second World War when Alan Turing’s bombe was used to break the codes of Germany’s Engima machine, the makers of software have sought better ways to build systems.
We adopt new computing paradigms, such as virtualization and cloud. We learn important lessons of what works, what doesn’t, and that which doesn’t work as well as expected. One step at a time, one project at a time, the state of the art advances. Looking at the miracles of the modern world, it’s clear that technology advances; and along with it we make advances in the techniques we use to build technology.
DevOps is a set of practices and tools to build better software. It combines development and operations teams into a whole (dev + ops = DevOps), and encourages them to work together. The unified team is then expected to own their application: from planning and coding, to testing, deployment, operations, and emergency. When something goes wrong, the developer who wrote the feature will often work alongside the QA engineer and systems administrator to troubleshoot and bring the service back online. Amazon, one of the early pioneers of DevOps practices and procedures, summarizes the whole of “What is DevOps” succinctly:
DevOps is the combination of tools, practices, and philosophies that increases an organization’s ability to deliver applications and services at high velocity … This allows for products to evolve and improve at a faster pace than organizations using traditional software development and management processes. Improved speed enables better customer service and enhances the ability to compete more effectively in the market.
It is, in effect, the convergence of everything: philosophy, practice, and architecture; and its proponents claim that it is revolutionary, capable of changing everything. But is it really? What are the benefits of DevOps? Are they worth the investment they require and the disruption they cause? What are the parts and pieces? How do they work together, and how can you start to take advantage of DevOps practices? In this article, we look at a number of these questions and attempt to look past the hype to some of the practical benefits of DevOps.
Benefits of DevOps
Perhaps a good place to start in this whirlwind tour is a discussion of the benefits. The DevOps movement started to coalesce someone between 2003 and 2008 (depending on who you ask) at large companies like Amazon, Google, Microsoft, and Netflix. At the time, there had been a number of high profile software project failures (remember Windows Vista, four years late and billions over budget), and there was a widespread belief that traditional software development methods were not working.
Through many conversations (both amongst themselves and with management), a set of best practices started to emerge that could improve the process by which software was built. Developers implemented some of those ideas into tooling, and operations enthusiastically adopted those tools to make their jobs easier. Over time, the practices and utilization of associated tools percolated up the management chains in many companies where they became adopted as formal policies. From the bottom-up, then, many companies find out that they are already practicing DevOps because that’s how their technical staff want to do things.
Companies that include DevOps in the life cycles of their applications can expect a myriad of benefits. It’s worth noting that (despite the hype), DevOps has been around for quite some time. There is now some fifteen years of data to help describe its benefits. Some of those benefits include: Sixty-three percent of organizations polled using DevOps have seen increased software releases. In 2013, DevOps teams were able to spend thirty-three percent more time on infrastructure improvements, having saved time in other areas. Teams using DevOps even work fewer hours outside of normal work hours.
Other major benefits of DevOps come in the ability to deliver applications, patches, and releases at a faster rate. Many companies that adopt practices such as continuous integration and delivery (CICD), have often been able to move from monthly or bi-monthly releases to daily product releases, and push fixes to customers in the span of a few hours. Puppet Labs (a maker of a popular automation tool), reported in 2016 “State of the DevOps Report” that:
Teams that practice DevOps deploy thirty times more frequently, have sixty times fewer failures and recover 160 times faster.
Additionally, they are more efficient and effective:
High performing organizations spend twenty-two percent less time on unplanned work and rework. They are able to spend twenty-nine percent more time on new work, such as new features and code.
How to Adopt a DevOps Model
Transitioning to DevOps requires that organizations commit to a mindset and institute changes in both culture and practice. Changes in how teams are organized will likely be required, and there may not even be separate development and operation teams. To that effect, what are the goals of DevOps organization? What are you trying to accomplish?
Goals of a DevOps Organization
Moving quickly allows for innovations to get into the hands of customers sooner and better adapt to changing markets. DevOps uses design approaches such as microservices and practices such as continuous delivery and integration to increase development velocity.
Frequent delivery of new releases is a healthy metric of a software organization. The faster that new features, enhancements, and bug fixes can be released; the better you can respond to customer’s needs.
One of the most difficult challenges in the delivery of software can be the operation of applications at scale. Large systems often require clustered hardware/software resources, real-time monitoring/alerting, and a commitment to cloud-friendly architectures.
Practices such as continuous integration and continuous delivery contribute to the reliability of an application the way that infrastructure as code contributes to scale. By providing comprehensive test suites and ensuring that they are invoked on each commit, it’s possible to know that commits are both functional and safe.
Rigorous testing and continuous integration, adoption of cloud platforms and infrastructure as code practices, along with multi-disciplinary teams that work together more closely combine to greatly improve application security. It allows DevOps organizations to move quickly while retaining control, preserving compliance, and data that can help show improvement. Many platforms allow you to go even further by also automating compliance policies and ensuring a granular permissions model using practices sometimes called “policy as code” which capture what best practice might be, and giving a template to audit against what has been deployed into production.
Room for Growth
Despite the momentous changes it has made in the industry, however, there is evidence that DevOps will continue to be a disruptive force. Though it traces its origins to 2008, in 2017 only seventeen percent of tech developers reported company-wide adoption of DevOps practices. More telling, however, is that seventy percent of developers implied that their organizations were considering the adoption of some or many DevOps practices.
DevOps Principles and Practices
The guiding principle of DevOps might be described as: “Go Smaller.”
How do you put DevOps into practice? What procedures and tools can help you implement DevOps processes?
In many ways, the guiding principle of DevOps might be described as “Go Smaller.” One fundamental example of this principle in practice is to perform frequent, but small updates. Deploying incremental updates to an application is much easier than deploying less frequent, massive upgrades. Putting code into the hands of customers helps teams to find and patch bugs more quickly, while the code may be fresh in the minds of the developers who wrote it.
A second example is to break large applications into smaller pieces called microservices. Applications that are broken into many small pieces, each with a very tightly scoped purpose, that are operated independently reduces the coordination overhead of large, tightly coupled applications.
Nothing comes for free, though. Decomposing applications and increasing the pace of release increases operational complexity. To actually release code takes time. Someone has to commit, review, test, build, and deploy. Likewise, splitting services into smaller pieces introduces network and environment complexity that may not apply with more traditional applications.
The good news is that the additional overhead and challenges that DevOps introduce can be addressed with tooling. For this reason, another core principle of DevOps is “automate everything“. Many jobs within traditional operations require a great deal of manual labor and consume significant amounts of time. Examples include:
- testing of the software: unit, functional, and integration
- staging new changes and building new artifacts
- deploying the staged environment to live production use of the application and performing actions required to upgrade components (such as running database migrations)
All of these tasks can be automated through practices such as Continuous Integration / Continuous Deployment (CI/CD). After an initial investment to build CI/CD pipelines, the entire process of testing and upgrade can proceed automatically. This means that every time a developer commits a new source code change, a verified and tested update can be pushed to production with zero downtime.
Other practices, like Infrastructure as code and configuration management, can help to scale computing resources as demand spikes. Further, the use of monitoring and logging when combined with automated actions can help systems react to outages of part of the system without needing to take everything offline.
Taken as a whole, DevOps practices help to deliver better and more reliable code to their c0ustomers more often. In the remainder of this section, we will provide an overview of key practices and some of the tools that are used to implement them.
A microservice is an architectural model with an approach based on splitting up large, monolithic programs into small sets of services that communicate with one another. Each service is a standalone process that talks to other processes through an application programming interface (API).
Microservices are independently built around different business capabilities; meaning each service has a single purpose. They hasten the development of an application significantly, as they are designed, programmed, and tested individually. In contrast, a monolith supports an entire application in one service. Having a tightly coupled service can create bottlenecks that are hard to work around. As an example, consider a service that includes an interface to a database and a search interface both in the same application. If the database becomes bottlenecked, the only way to increase the performance of that one piece is to deploy a second instance with both search and database interfaces, despite no issues with the search code. In a microservice architecture, where the database and search interfaces are separate, the database component can be scaled without needing additional instances of search.
This idea can be taken one step further through the use of Functions as a Service (FaaS). FaaS systems allow for very small pieces of functionality to be exposed for consumption via a network. Complex applications mix and match the functionality of lower-level pieces in order to deliver complex functionality to users.
Monitoring is the real-time observation of the application’s well-being. DevOps encourages organizations to integrate monitoring into every part of an application’s operations. This allows for insight into performances, in addition to providing notification about failures (and why they may have occurred).
Operational data is valuable. It can be used to run analytics, drive decision making toward updates, show how users engage with parts of the system, and help in making decisions about what features should be developed or patches created. When utilizing a microservice architecture, monitoring (and event aggregation) becomes critical as you have multiple processes running in different environments you must keep track of. When working with systems that might span individual servers, understanding the big picture requires that you track and aggregate it.
Infrastructure as Code
Infrastructure as Code is the practice of using the same tools and methods leveraged for software development in the management of infrastructure. At a practical level, this means that application configuration will be kept in version control, analyzed, and tested using continuous integration pipelines. Developers and administrators are then able to leverage the configuration and application interfaces to interact with infrastructure programmatically.
This allows for resources to be deployed, updated, and managed without needing to manually configure them. When streamlined, it is possible to interact with infrastructure as though it were application source code. This allows for servers to be deployed quickly using standard templates, updated consistently, and duplicated efficiently. Infrastructure as code goes a long way toward solving the challenges of running applications at large scale.
Continuous integration takes the automation of tests to the next level. It integrates suites of tests into “pipelines” that can be executed when developers merge and commit their code to a central repository. Such pipelines allow for the state of a software project to be monitored continuously, with the goal of finding and addressing bugs more quickly so that software can be validated and released more frequently. Continuous integration leverages automation applications like Jenkins or Travis-CI to automatically detect new source code changes, and immediately run tests on the new code.
Continuous delivery couples the testing to pipelines which can automatically build and prepare the software for deployment to a staging or production environment. When continuous delivery functions as intended, developers and operations will have software builds that have gone through a rigorous testing process and is ready for deployment. Using such systems, companies such as Etsy, have reported being able to update their production environments fifty times per day (or more). With a strong CI/CD pipeline, you tackle traditional software releases by deploying frequent confident changes, knowing that each release has undergone an extensive quality testing cycle. CI’s goal is to provide confidence in the functionality of the program through testing, while CD is handling the deployment of the application into production.
Communication and Collaboration
By converging development and operations, aggressively adopting automation, leveraging infrastructure platforms, continuous integration/development, and monitoring; it becomes essential that all members of a team communicate. DevOps tooling allows for messaging to come together from multiple streams into common platforms such as chat, project tracking systems, and wikis. Using such tools and setting expectations about how groups should communicate helps developers, operations, and other teams (such as marketing and sales) align more closely and reach organizational goals.
How Does DevOps Help Organizations?
Many organizations (businesses, governments, and non-profits) have applied DevOps approaches with success. The case studies below highlight some of the specific challenges that companies like Amazon, Netflix, and Etsy have been able to overcome using DevOps.
One of the initial challenges that pushed Amazon towards the adoption of DevOps was how to determine how much server space might be needed at any given time, without wasting the excess. During most of the year, Amazon would leave as much as 40% of their total server capacities unused; with far too much processing compute time being wasted on their physical servers.
To consolidate server workload, Amazon made the decision to shift from physical to virtual servers; and in the process, pioneered many Infrastructure as Code Practices. They then built sophisticated systems to allow for configuration descriptions and complex software stacks to be automatically deployed on top of the resulting environments. The resulting systems have made Amazon much more efficient. On average, Amazon reports that they are able to deploy new versions of their software services every 11.7 seconds.
Once these systems were working well internally, Amazon chose to expose many of them for use outside of the company. Collectively, Amazon Web Services (AWS), have become the major option for other companies wishing to use “public cloud.”
Amazon Web Services
With a robust service-oriented strategy has come strong profits. AWS is Amazon’s most profitable product. Not only is it used by Amazon to efficiently sell their merchandise and that of their partners, but they also host the operations of tens of thousands of other organizations. Some examples of AWS DevOps services include:
- AWS Developer Tools: a service that helps you develop and deploy code using CI/CD principles. The service provides systems for storing and versioning of source code, automatically creating builds, invoking tests, and deployment of application artifacts to AWS infrastructure. Related services include AWS CodePipeline, AWS CodeBuild, AWS CodeDelopy, AWSCodeStar.
- An example use for AWS CodeDeploy is allowing developers to create blue/green deployments that help minimize downtime during updates. A Blue/green deployment works by having a new version of the application staged alongside the old version, testing the new version before traffic is routed to it, and mapping the new routes upon successful completion of the tests. Using strategies like Blue/Green deployment can allow for updates with near-zero downtime.
- AWS Cloud Storage: a service that provides affordable and scalable storage. Among the most popular of the AWS services, it is used by many companies as a reliable way to warehouse huge amounts of data. Alongside data storage, AWS provides distributed computing systems such as Amazon EMR, which can be used to process the data using Machine Learning, distributed analytics, and large-scale ETL.
- Among the companies using Amazon’s analytics services is GE Healthcare, which detects critical conditions faster using deep learning on AWS. GE created a database (GE Health Cloud) on top of Amazon’s virtualization system (Elastic Compute Cloud or EC2), capable of processing more than a petabyte of medical imaging data stored inside of Amazon Simple Storage Service (S3). The distributed nature of the system and high throughput allows for GE to quickly run simulations and queries in near real-time.
For more information on AWS and GE Healthcare, check out these videos:
Netflix started its DevOps journey for the reason many do: because of catastrophe. Badly hurt by a major outage in their physical database infrastructure in 2008, which halted all service to customers for three days, Netflix decided to adopt cloud computing techniques and architect their applications as robust systems of microservices. In taking this approach, Netflix chose to ensure that every service was redundant and could withstand failures of any of the others.
Taking resiliency to the extreme, Netflix has pioneered approaches called “Chaos engineering” wherein they actively introduce failures in specific components to ensure that the system as a whole remains operational. This has led to a thriving DevOps-based culture utilizing AWS, test automation, and continuous deployment. Netflix prides themselves on being able to test, package, and deploy much of their software stack within minutes without disrupting the streaming services that must be running all day, every day for tens of millions of subscribers.
AWS and Spinnaker
Netflix makes extensive use of Infrastructure as Code, cloud computing capability, and continuous integration/deployment.
- Netflix learned the hard way that vertical scaling of computer resources was not a proficient way of hosting a database. When their database corrupted in 2008, they were forced to rebuild the entire system from the ground up. They further learned that increasing or decreasing computing power to one machine is a very limited way to grow a system. From this, they aggressively adopted a horizontal scaling model. In horizontal scaling, improved elasticity of the system is gained by adding additional instances on new infrastructure, either physical or virtual, rather than more RAM or CPU to a single machine.
- Netflix uses AWS for nearly all its computing and storage needs including databases, analytics, video encoding, recommender engines, and hundreds of other functions that require over 100,000 server instances from AWS. Netflix logs all of its information data for monitoring and analytics using Amazon Kinesis Streams, a service that centralizes the data of the application into a pipeline that can be consumed by other monitoring applications.
- When Netflix was initially developing its streaming platform, a massive issue arose; “How can they maintain the platform without having the servers go down for maintenance?” To solve this problem, Netflix and a group of developers started a project called Spinnaker. An open-source tool that implements continuous deployment, Spinnaker has become one of the most popular platforms for automating the roll-out and update of software builds.
- Using Spinnaker, Netflix is able to roll out updates to its entire system within 16 minutes of a new code commit.
Before adopting DevOps, Etsy struggled early on with slow, large deployments to their monolithic application. This was further aggravated by a lack of collaboration and trust between isolated teams.
Prior to DevOps, their deployment rate was about twice a week. After adopting DevOps, Etsy has been able to deploy new services more than sixty times a day. Concurrent with this transition was huge growth of the platform as it was able to better meet customer needs without downtime.
Jenkins and Kale
While a competitor to Amazon in many ways, Etsy is among many large companies that use AWS for their operations. Like Netflix, Etsy leverages many AWS services to provide a stable and robust platform.
- As noted above, Etsy is able to deploy code between fifty and one-hundred times per day. They do this through the use of CI/CD pipelines run on top of Jenkins capable of executing more than 14,000 test suites each day.
- To track the many different services required to operate the platform, Etsy built Kale, a monitoring system that is utilized to detect anomaly patterns within the infrastructure operations data. Kale is used to monitor every deployment that Etsy performs while making sure the application is stable and healthy before making it available to users.
Tools to get you started.
Implementing DevOps with Open Source
The Open Source ecosystem provides a comprehensive suite of tools that can be used to implement all aspects of DevOps. Important tools include:
- Git: a distributed content versioning system that has become the most popular software tool for tracking and managing source code. Additional platforms, sometimes called code forges, can be used to facilitate collaboration between developers working on software. The most popular hosted forges include GitHub and GitLab.
- Ansible: Infrastructure as code (IaC) tool that allows for the automating of the provisioning and deployment process for infrastructure. Automation is configured as a readable description of state in yaml called playbooks. Provisioning tools read the description of state and then generate sets of instructions that will create an environment that matches.
- Docker: an end-to-end platform for building, sharing and running container-based applications. Used in environments as diverse as a developer’s desktop to the cloud, Docker allows for the same build artifact to be leveraged at every stage of an application’s lifecycle.
- Kubernetes: an open-source orchestration system for automating deployment, scaling, and management of containerized applications. Increasingly, Kubernetes (and the further systems that build on top of it) has become the core component of DevOps infrastructure. Kubernetes works closely with container technologies such as Docker. While Docker handles the execution and packaging, Kubernetes automates the processes of deploying Docker-based software across broad clusters of systems.