How to Address Machine Learning Model Drift

2022-12-24 19:58:38
关注

How to Address Machine Learning Model Drift
Illustration: © IoT For All

Most people know about artificial intelligence (AI), but fewer are well-versed in the concept of machine learning (ML). There’s a lot to know about this high-tech process, and it seems there are always new things to learn about it; for example, machine learning model drift.

One drawback of operating an ML model is that it needs to be retrained as time passes. The accuracy of an ML model’s predictions decreases as business outcomes, the economy, and customer expectations change, a concept called “model drift.” 

When does ML model drift occur, and how can practitioners address it?

'AI and ML are becoming increasingly popular technologies in today’s digitally driven world. Some of the largest corporations leverage ML to deliver products and services.' -Zac AmosClick To Tweet

What Is Machine Learning Model Drift?

AI and ML are becoming increasingly popular technologies in today’s digitally driven world. Some of the largest corporations leverage ML to deliver products and services. Take Netflix, for example. The streaming service uses ML models for several reasons, such as forming recommendations or learning which characteristics make content successful.

Businesses are investing in AI solutions, consumers are paying for ML-curated content and engineers are finding new applications across industries. The most essential component of any AI or ML solution is structured and unstructured data. It’s complex and subject to change over time, and information used for ML model training is no exception. 

ML models suffer from model drift when they provide less accurate predictions. Model drift or decay can render the model unstable, making its predictions increasingly erroneous.

A core principle of ML is that high-quality data produces accurate predictions. However, what the original model was trained to achieve may become irrelevant or outdated. ML engineers and specialists must go through the process of retraining and redeploying the model, making sure to use the latest training data available. If not, the model will continue to make predictions with low accuracy.

There are two types of model drift: concept and data.

Concept Drift

Concept drift occurs when a model’s target or statistical properties change. During a model’s training period, it learns a function that maps the target variable. However, time goes on, and eventually, the model unlearns those patterns or cannot use them in a new environment. This type of drift can occur seasonally, gradually, or suddenly, making it challenging to anticipate when it will happen.

Data Drift

Data drift — or covariate drift — occurs when ML training information changes. All input changes to a model will impact the final predictions. The distribution of its variables will be different, so users need to be aware of this discrepancy. 

How to Address Model Drift

ML experts often use drift detection tools, which automate model monitoring. However, there are other ways data scientists and ML experts can handle cases of drift.

Here are the steps one would need to take to address model drift. 

Analyze the Drift

It’s vital to plot the distributions of drifted features with the ultimate goal of determining what has changed to cause the drift. Does it match the baseline of the static ML model? Surprisingly, some drifts are less meaningful than others, so experts must analyze them carefully and decide if it’s worth addressing.

Check Data Quality

Organizations that detect drift should first check the model’s input data. Something changed, but what? Is the model still relevant to the goals of the project? Data quality should always be the first suspect regarding cases of drift.

Users can choose to address the drift or do nothing. Receiving an alert might be a false alarm, or perhaps people are satisfied with how the drift impacted predictions. However, sometimes change is necessary.

Retrain the Model

Since data distributions shift over time, it’s critical to retrain the model after drift is detected. Deploying an ML model is not a one-and-done project but a continuous one. 

The main reason why it’s crucial to retrain a model with drift is that it keeps it on top of emerging trends between input and output data. Check the model every few weeks or months throughout the year to ensure it’s working with the latest training information.

Monitor for Issues

Once the model learns from the new training data, keep an eye out to see how the drift was affected. Periodic updates are wise, and checking the model post-retraining will help data scientists and other professionals see if the drift still occurs.

If drift is detected, follow the steps outlined above. Drift detection tools are worthwhile investments, as they remove the extra responsibility and time needed to make corrections.

Beware of Drift in ML Projects

Drift is something every data scientist, researcher, and engineer should be aware of, especially in today’s competitive business sector. One of ML’s most notable features is the ability to use historical data to predict future outcomes. 

Outcomes become inaccurate when drift occurs. Any business decisions made following this information could damage the organization. Beware of concept and data drift, as it greatly affects the model’s performance.

Tweet

Share

Share

Email

  • Machine Learning
  • Artificial Intelligence
  • Big Data
  • Data Analytics

  • Machine Learning
  • Artificial Intelligence
  • Big Data
  • Data Analytics

参考译文
如何解决机器学习模型漂移问题
插图:© IoT For All --> 大多数人听说过人工智能(AI),但了解机器学习(ML)概念的人就少多了。关于这项高科技过程,有很多内容需要了解,并且似乎总有新的东西可以学习;例如,机器学习模型漂移。运行ML模型的一个缺点是,随着时光流逝,它需要重新训练。随着业务成果、经济状况和客户期望的变化,机器学习模型的预测准确性会下降,这种现象称为“模型漂移”。机器学习模型漂移何时发生,从业者又该如何应对?“在当今这个由数字驱动的世界中,人工智能和机器学习正变得越来越受欢迎。一些最大的公司利用机器学习提供产品和服务。”——Zac Amos 点击推文 什么是机器学习模型漂移?在当今这个由数字驱动的世界中,人工智能和机器学习正变得越来越受欢迎。一些最大的公司利用机器学习提供产品和服务。以Netflix为例。这家流媒体服务公司使用机器学习模型,原因包括形成推荐列表,或者了解哪些特征使内容更受欢迎。企业正在投资AI解决方案,消费者正在为经过机器学习优化的内容付费,工程师们则在各个行业中发现新的应用场景。任何AI或ML解决方案中最关键的组成部分是结构化和非结构化数据。数据是复杂且会随时间而变化的,机器学习模型训练所用的信息也不例外。 当机器学习模型的预测准确性降低时,就会出现模型漂移。模型漂移或退化会使模型变得不稳定,导致其预测越来越错误。机器学习的一个核心原则是高质量的数据能够产生准确的预测。然而,原始模型所训练的目标可能会变得无关紧要或过时。机器学习工程师和专家必须经历重新训练和重新部署模型的流程,确保使用最新的训练数据。否则,模型将继续产生低准确度的预测。 模型漂移有两种类型:概念漂移和数据漂移。 概念漂移 概念漂移发生于模型的目标或统计属性发生变化时。在模型的训练阶段,它学习了一个映射目标变量的函数。但是,随着时间的推移,模型会逐渐失去这些模式,或无法在新环境中使用它们。这种类型的漂移可能是季节性的、逐步发生的,也可能是突然发生的,使得难以提前预测何时会发生。 数据漂移 数据漂移——或协变量漂移——发生在机器学习训练信息发生变化时。模型的所有输入变化都会影响最终的预测结果。其变量的分布会有所不同,因此用户需要意识到这种差异。 如何应对模型漂移 机器学习专家通常使用漂移检测工具,这些工具可以自动化模型监控。然而,数据科学家和机器学习专家还有其他方法可以处理漂移情况。以下是应对模型漂移所需采取的步骤。 分析漂移 绘制漂移特征的分布至关重要,目标是确定是什么变化导致了漂移。它是否符合静态机器学习模型的基准?令人惊讶的是,一些漂移比其他漂移的意义更小,因此专家必须仔细分析,并决定是否值得处理。 检查数据质量 检测到漂移的组织首先应检查模型的输入数据。某些东西发生了变化,但具体是什么?模型是否仍然与项目目标相关?数据质量应始终是漂移情况的主要嫌疑对象。用户可以选择处理漂移或什么都不做。收到警报可能是误报,或者人们可能对漂移影响预测的程度感到满意。但有时候,改变是必要的。 重新训练模型 由于数据分布会随时间变化,因此在检测到漂移后重新训练模型至关重要。部署机器学习模型并不是一次完成的项目,而是一个持续的过程。 重新训练模型的主要原因是让模型能够跟上输入数据和输出数据之间新兴趋势的步伐。 每年定期检查模型几次,以确保它使用的是最新的训练信息。 监控问题 一旦模型从新的训练数据中学习,就要密切关注漂移如何受到影响。定期更新是明智的,重新训练模型后进行检查将有助于数据科学家及其他专业人士判断漂移是否仍然发生。如果检测到漂移,请按照上述步骤进行操作。漂移检测工具是一项值得的投资,因为它们减少了进行修正所需的额外责任和时间。 在机器学习项目中注意漂移 每位数据科学家、研究人员和工程师都应该注意漂移,尤其是在当今竞争激烈的商业领域。机器学习最显著的特点之一是能够使用历史数据预测未来结果。当发生漂移时,预测结果就会变得不准确。任何根据这些信息做出的商业决策都可能损害组织。注意概念漂移和数据漂移,因为它们会严重影响模型的性能。 推文分享分享电子邮件 机器学习 人工智能 大数据 数据分析 --> 机器学习 人工智能 大数据 数据分析
您觉得本篇内容如何
评分

评论

您需要登录才可以回复|注册

提交评论

提取码
复制提取码
点击跳转至百度网盘