What is DataOps? Collaborative, cross-functional analytics
DataOps (data operations) is an emerging discipline that brings together DevOps teams with data engineer and data scientist roles to provide the tools, processes and organizational structures to support the data-focused enterprise.
What is DataOps?
DataOps (data operations) is an agile, process-oriented methodology for developing and delivering analytics. It brings together DevOps teams with data engineers and data scientists to provide the tools, processes and organizational structures to support the data-focused enterprise. Michele Goetz, vice president and principal analyst at Forrester, defines DataOps as, “the ability to enable solutions, develop data products, and activate data for business value across all technology tiers from infrastructure to experience.”
According to Dataversity, the goal of DataOps is to streamline the design, development, and maintenance of applications based on data and data analytics. It seeks to improve the way data is managed and products are created, and to coordinate these improvements with the goals of the business.
DataOps vs. DevOps
DevOps is a software development methodology that brings continuous delivery to the systems development lifecycle by combining development teams and operations teams into a single unit responsible for a product or service. DataOps builds on that concept by adding data specialists — data analysts, data developers, data engineers, and/or data scientists — to focus on the collaborative development of data flows and the continuous use of data across the organization.
“You’ve got the modern trend for development of DevOps, but more and more people are injecting some sort of data science capability into development, into systems, so you need someone on the DevOps team who has a data frame of mind,” says Ted Dunning, CTO for MapR at HPE and co-author of Machine Learning Logistics: Model Management in the Real World.
Like DevOps, DataOps takes its cues from the agile methodology. The approach values continuous delivery of analytic insights with the primary goal of satisfying the customer.
According to the DataOps Manifesto, DataOps teams value analytics that work, measuring the performance of data analytics by the insights they deliver. DataOps teams also embrace change and seek to constantly understand evolving customer needs. They self-organize around goals and seek to reduce “heroism” in favor of sustainable and scalable teams and processes.
DataOps teams also seek to orchestrate data, tools, code, and environments from beginning to end, with the aim of providing reproducible results. DataOps teams tend to view analytic pipelines as analogous to lean manufacturing lines and regularly reflect on feedback provided by customers, team members, and operational statistics.
Where DataOps fits
Enterprises today are increasingly injecting machine learning into a vast array of products and services and DataOps is an approach geared toward supporting the end-to-end needs of machine learning.
“For example, this style makes it more feasible for data scientists to have the support of software engineering to provide what is needed when models are handed over to operations during deployment,” Dunning and co-author Ellen Friedman, principal technologist at HPE, write.
“The DataOps approach is not limited to machine learning,” they add. “This style of organization is useful for any data-oriented work, making it easier to take advantage of the benefits offered by building a global data fabric.”
They also note DataOps fit well with microservices architectures.
DataOps in practice
To make the most of DataOps, enterprises must evolve their data management strategies to deal with data at scale and in response to real-world events as they happen, according to Dunning and Friedman.
“Traditionally siloed roles can prove too rigid and slow to be a good fit in big data organizations undergoing digital transformation,” they write. “That’s where a DataOps style of work can help.”
Because DataOps builds on DevOps, cross-functional teams that cut across “skill guilds” such as operations, software engineering, architecture and planning, product management, data analysis, data development, and data engineering are essential, and DataOps teams should be managed in ways that ensure increased collaboration and communication among developers, operations professionals and data experts.
Data scientists may also be included as key members of DataOps teams, according to Dunning. “I think the most important thing to do here is to not stick with the more traditional Ivory Tower organization where data scientists live apart from dev teams,” he says. “The most important step you can take is to actually embed data scientists in a DevOps team. When they live in the same room, eat the same meals, hear the same complaints, they will naturally gain alignment.”
But Dunning also notes that data scientists may not need to be permanently embedded in a DataOps team.
“Typically, there’s a data scientist embedded in the team for a time,” Dunning says. “Their capabilities and sensibilities begin to rub off. Someone on the team then takes on the role of data engineer and kind of a low-budget data scientist. The actual data scientist embedded in the team then moves along. It’s a fluid situation.”
How to build a DataOps team
Most DevOps-based enterprises already have the nucleus of a DataOps team on hand, Friedman says. Once they have identified projects that need data-intensive development, they need only add someone with data training to the team. That person may even be a data engineer rather than a full-on data scientist.
Oftentimes, teams will be built of individuals with overlapping skillsets, or individuals may take on multiple roles with a DataOps team, depending on expertise.
“In large-scale projects, a particular DataOps role may be filled by more than one person, but it’s also common that some people will cover more than one role,” Dunning and Friedman write in their book. “Operations and software engineering skills may overlap; team members with software engineering experience also may be qualified as data engineers. Often, data scientists have data engineering skills. It’s rare, however, to see overlap between data science and operations.”
According to Forrester’s Goetz, some of the key areas of expertise on DataOps teams include:
- Data to process orchestration
- Data policy deployment
- Data and model integration
- Data security and privacy controls
- Regardless of makeup, DataOps teams must share a common goal: the data-driven needs of the services they support.
“With engineering teams, good engineers, what you need to do is you need to set goals well,” Dunning says. “Once there’s a common goal, solving a problem, then the team organizes itself very often toward solving that problem. The difficulty comes when different people see different aspects of the problem. Ops people are going to be worried about reliability, that you get an answer within a certain time. The data science person tends to be focused on the accuracy of the answer. You’ve already got a bit of a divergence. But if they’re trying to solve the same problem and they’re willing to compromise on how it’s solved, I think it’s a pretty easy social structure to build up.”
According to Goetz, DataOps team members include:
- Data specialists, who support the data landscape and development best practices
- Data engineers, who provide ad hoc and system support to BI, analytics, and business applications
- Principal data engineers, who are developers working on product and customer-facing deliverables