Key Features of a Comprehensive MLOps Platform

2022-08-10 10:15:41
关注


Illustration: © IoT For All





Artificial Intelligence (AI) and Machine Learning (ML) present a boon for businesses as technologies with the potential to help organizations make better predictions, create innovative services for customers, and deliver faster business outcomes. Finance teams, operations, customer success, and marketing departments stand to benefit. That said, the overarching reason that organizations are facing challenges and delays in bringing ML models into production stems from the fact that models are different from traditional software, and most organizations don’t yet have frameworks and processes for dealing with these differences. Let’s take a look at how a MLOps platform can help with efficiency and collaboration.


'Any MLOps platform should take a human-centered approach - meaning it is designed to provide users with the critical information they need' -Navin BudhirajaClick To Tweet


What is MLOps?

MLOps is a practice for a subset of the ML model life cycle to help teams deploy, manage, and maintain machine learning models and drive consistency and efficiency across an organization. Similar to DevOps—a set of practices that integrate software development with IT operations—MLOps adds automation to streamline the orchestration of steps in the workflow that begins once a model is ready to go into production.

A machine learning model’s life cycle spans many steps, and typically they are all managed by different people across discrete systems that need to be connected. These systems are used for data collection, data processing, feature engineering, data labeling, model building, training, optimizing, deploying, risk monitoring, and retraining. And in each organization, different people and teams may own one or more steps.

In an ideal environment, ML models are solving company problems and driving better decision analyses. However, only a fraction of these models enters production. Even then, it typically takes months for a successful model to become active, according to Gartner. That’s because the process of deploying a machine learning model into production is often disjointed. Siloed teams of data engineers, data scientists, IT ops professionals, auditors, business domain experts, and ML engineering teams operate in a patchwork arrangement that bogs down the process.

Downfalls of MLOps

Part of the problem is that MLOps is still an emerging discipline, and different people perform the tasks that span MLOps in each organization. In some organizations, data scientists are involved in nearly every step of a model’s life cycle; in others, there may be discrete teams for each phase or teams that own one or more areas. For organizations to realize the full value of ML, models need to be put into production quickly and at scale. There needs to be a guide for how to handle MLOps that makes sense for an enterprise’s goals and the structure of its team. Because of this, MLOps platforms are taking on an increasingly critical function in expediting the ML efforts of organizations.

Platforms have the potential to deliver a blueprinted strategy to create repeatable and streamlined processes, regardless of whether the industry is manufacturing or financial services—or any other industry. End-to-end platforms can save an innumerable amount of time because of how many models can be deployed and monitored simultaneously while operating at the speed businesses need. The best MLOps platforms provide solutions for all ML stakeholders so they can not only deploy and manage models at scale but also foster efficiency through collaboration and communication across the different people using the platform at different stages. Let’s take a look at the four main aspects of a successful MLOps platform.

A Successful MLOps Platform

#1: A Collaborative Experience for all Stakeholders

Given that the key stakeholders—from data teams to engineers to risk auditors—tend to function in silos in many organizations, simplifying the process to enable any user to perform a specific role leads to better outcomes for efficiently performing tasks. Platforms that enable collaboration across an organization provide teams the ability to quickly operationalize models, regardless of the tools data scientists used to create those models. There is no longer a need to restrict any other user, such as machine learning engineers or IT teams. Each platform user should be able to use the tools they already have and leverage their expertise with those tools. Having a single, collaborative interface that can intuitively guide a user through the steps that abstract the complexity of the process is a beneficial component of MLOps.

#2: A User-First, Modular Architecture

Given that many organizations may be handling MLOps differently, platforms that meet them where they are offer immediate value. A platform with a modular architecture provides organizations the necessary flexibility to get up and run quickly by enabling each person to use the platform functionality that they need when they need it rather than forcing them to operate in a linear fashion.

For example, an organization may have data scientists with a preferred set of tools but who lack the ability to easily deploy or monitor models in production. An MLOps platform designed with openness and users in mind will offer easy plug-and-play components so each user can make decisions on the best cloud, database, repositories, and other components to use without having to make sweeping changes. Every company will implement the process of operationalizing models a bit differently, and modular architecture enables MLOps teams to leverage their entire suite of tools and seamlessly bring specific components of the platform into their ML workflows.

#3: An Emphasis on Optimization

As models become larger and increasingly more complex, one of the challenges organizations often run into is a dramatic increase in hardware or computing needs. Machine learning is, by definition, data-intensive and will cost organizations a lot of money without careful consideration of the infrastructure in place. Models that take a long time to head to production coupled with rising costs for hardware are a recipe for tension with executives and leadership weighing ROI in an organization.

MLOps platforms that can optimize models and present model performance and cost-saving data in a format that helps users make decisions based on the factors most important to them can alleviate some of the challenges organizations face as they ramp up ML modeling and production. As more companies deploy more ML models to more devices that can be on any cloud, edge devices, or on-prem, the ability to optimize models will become increasingly important.

#4: An Ability to Continuously Monitor Models in Production

It’s important for an MLOps platform to accelerate the process of bringing models to production. But once there, the real work begins, and platforms need to enable teams to continuously monitor risks, such as model performance and unstructured data, and quickly take action to mitigate operational and reputational risk.

ML models are not static. They are trained and tested in environments that are controlled, but when models are deployed into production they are making predictions based on real-world data which can be quite different for a variety of reasons. For example, a model’s performance or accuracy in predictions can change. Models also experience different types of drift, such as data drift, for when there is a significant change in buying patterns. This happened during COVID, for example, and led to former distribution patterns no longer being accurate.

Simplifying the Process

To help teams continuously monitor models in production, MLOps platforms should simplify the ability to:

1. Set alerts based on custom thresholds.
2. Provide quick at-a-glance access to key data points showing which models are failing.
3. Rapidly identify the root cause and take action.

Leveraging an integrated platform allows for the creation of a customized risk monitoring plan before and after deployment. A comprehensive approach to mitigating risk includes evaluating uncertainties within the data to guide AI/ML teams along the right path.

Platforms Must Put Humans First

We are still in the early stages of figuring out how to best use ML in enterprises. Any MLOps platform should take a human-centered approach—meaning it is designed to provide users with the critical information they need, an intuitive way to complete the tasks they need to complete, and the ability to collaborate and communicate with other stakeholders and colleagues. Platforms that put human workers first help build trust between people and ML. This garnered level of trust between person and machine helps ease the mind of the worker and allows the technology to perform many of the statistical tasks to aid these workers. The intentional design of such platforms will continue to focus on augmenting and amplifying human intelligence and delivering new opportunities to promote collaboration with advancing AI and ML initiatives.


  • Artificial Intelligence
  • Automation
  • Data Analytics
  • Data Scientist
  • Machine Learning


参考译文
综合MLOps平台的主要特点

插图:© IoT For All --> 人工智能(AI)和机器学习(ML)为各类企业带来了巨大的发展机遇,它们具有帮助组织做出更准确的预测、为客户提供创新服务并加快业务成果实现的潜力。财务团队、运营部门、客户成功团队和市场营销部门都可从中受益。话虽如此,组织在将机器学习模型投入生产时面临挑战和延迟的根本原因在于:模型与传统软件不同,而大多数组织尚未建立起处理这些差异的框架和流程。让我们来看看MLOps平台如何在效率与协作方面提供帮助。"'任何MLOps平台都应以人为本——意味着它被设计成能够向用户提供他们所需的关键信息。"——Navin Budhiraja 点击推文  什么是MLOps?  MLOps是机器学习模型生命周期中一个子集的实践方法,旨在帮助团队部署、管理并维护机器学习模型,从而在整个组织中实现一致性和效率。与DevOps相似——后者是一套将软件开发与IT运维整合在一起的实践方法——MLOps通过自动化来简化从模型准备就绪后开始的流程步骤的编排。一个机器学习模型的生命周期包括许多步骤,通常这些步骤由不同人在离散系统中分别处理,而这些系统需要相互连接。这些系统用于数据收集、数据处理、特征工程、数据标注、模型构建、训练、优化、部署、风险监控以及重新训练。而在每个组织中,不同的人员和团队可能会负责其中的一个或多个步骤。理想情况下,机器学习模型正在解决公司问题并推动更优的决策分析。然而,只有少数模型进入了生产阶段。即便如此,根据Gartner的数据,一个成功的模型通常还需要数月时间才能开始运行。这是因为将机器学习模型部署到生产中的过程往往脱节。数据工程师、数据科学家、IT运维人员、审计员、业务领域专家和ML工程团队等团队在各自独立的“孤岛”中工作,这种零散的安排会严重拖慢整个流程。  MLOps的短板之一是,MLOps作为一门学科仍处于发展初期,不同组织中的不同人员在执行与MLOps相关的工作。在某些组织中,数据科学家可能会参与模型生命周期中的几乎每一个步骤;而在另一些组织中,可能会为每个阶段设立独立的团队,或者由一个团队负责一个或多个领域。为了充分发挥机器学习的全部价值,组织需要能够快速且大规模地将模型投入生产。因此,组织需要一套适合其企业目标和团队结构的MLOps操作指南。正因为如此,MLOps平台在加快企业机器学习进程方面正发挥着越来越关键的作用。这些平台有潜力提供一种蓝图式策略,帮助企业构建可重复、可优化的流程,无论其行业是制造业还是金融服务,抑或其他任何行业。端到端的平台可以节省大量时间,因为它们能够同时部署和监控多个模型,并以企业所需的高速度运行。最佳的MLOps平台为所有机器学习利益相关者提供了解决方案,使他们不仅能大规模部署和管理模型,还能通过协作和沟通促进效率提升。让我们来看看一个成功的MLOps平台的四个主要方面。  一个成功的MLOps平台 #1:所有利益相关者的协作体验  由于从数据团队到工程师再到风险审计人员等关键利益相关者在许多组织中往往各自为政,简化流程以使任何用户都可以高效执行特定角色,将有助于提升任务执行的成果。那些能够跨组织协作的平台,能够让团队快速地将模型投入运行,无论数据科学家使用了什么工具来构建这些模型。不再需要限制其他用户,例如机器学习工程师或IT团队。每个平台用户都应可以使用他们已有的工具,并利用其在这类工具上的专业知识。拥有一个单一、协作式且能够直观引导用户完成抽象步骤的用户界面,是MLOps的重要组成部分。  #2:以用户为中心的模块化架构  鉴于许多组织可能以不同的方式处理MLOps,那些能够契合组织当前状态的平台可以立即带来价值。采用模块化架构的平台,可以为组织提供必要的灵活性,使其快速启动并运行,因为它允许每个人在需要的时候使用平台提供的功能,而不需要线性地操作。例如,一个组织可能有使用特定工具集的数据科学家,但缺乏轻松部署和监控生产中模型的能力。一个以开放和用户为导向设计的MLOps平台,将提供易于插拔的组件,使每个用户能够决定使用哪种云、数据库、存储库等组件,而无需进行大规模的更改。每家公司都会以略有不同的方式实施模型操作流程,而模块化架构允许MLOps团队充分利用其全套工具,并无缝地将平台的具体组件纳入其机器学习工作流程中。  #3:对优化的高度重视  随着模型变得越来越大、越来越复杂,企业面临的一个挑战是硬件或计算资源需求的大幅增加。机器学习本质上是数据密集型的,如果没有对基础设施的周全考虑,它将给企业带来巨大的成本。模型部署所需的时间较长,加之硬件成本上升,容易引发企业高管对企业在机器学习模型和生产方面的投资回报率(ROI)产生忧虑。那些能够优化模型并以帮助用户根据自身最重要的因素进行决策的方式呈现模型性能和节省成本数据的MLOps平台,可以缓解企业在机器学习模型和生产方面面临的部分挑战。随着越来越多的公司将更多机器学习模型部署到云端、边缘设备或本地设备上,模型优化的能力将变得越来越重要。  #4:能够持续监控生产中的模型  MLOps平台的重要任务之一是加速模型进入生产环境的过程。但一旦进入生产环境,真正的工作才刚刚开始,平台需要使团队能够持续监控风险,如模型性能和非结构化数据,并迅速采取行动,以减轻运营和声誉风险。机器学习模型并非静态不变的。它们是在受控环境中进行训练和测试的,但一旦部署到生产环境中,模型将基于完全不同的现实世界数据做出预测。例如,模型的预测性能或准确性可能发生改变。模型还会经历各种类型的“数据漂移”,如消费模式发生显著变化时出现的“数据漂移”。例如,疫情期间就曾发生过这种情况,导致以前的分布模式不再准确。  简化流程  为了帮助团队持续监控生产中的模型,MLOps平台应简化以下能力:  1. 根据自定义阈值设置警报。  2. 快速获取关键数据点,查看哪些模型正在失败。  3. 快速识别根本原因并采取行动。  使用集成平台可以创建部署前后的定制化风险监控计划。全面的风险缓解方法包括评估数据中的不确定性,以引导AI/ML团队走上正确的路径。  平台必须以人为本  我们目前仍处于探索如何在企业中最佳使用机器学习的早期阶段。任何MLOps平台都应以人为本——意味着它被设计成能够向用户提供他们所需的关键信息、一种直观完成任务的方式,以及与利益相关者和同事协作沟通的能力。以人类工作者为先的平台有助于建立人与机器学习之间的信任。这种获得的“人机”信任感可以缓解工作者的担忧,并允许技术执行大量统计任务,从而协助这些工作者。这类平台的有意设计将继续专注于增强和放大人类智能,并为推动AI和ML计划的协作创造新机会。  人工智能 自动化 数据分析 数据科学家 机器学习

您觉得本篇内容如何
评分

评论

您需要登录才可以回复|注册

提交评论

广告
提取码
复制提取码
点击跳转至百度网盘