AI coding assistants leave developers “deluded” about software quality – study

2022-12-29 23:05:17
关注

Artificial intelligence-based coding assistants like GitHub’s Copilot leave developers “deluded” about the quality of their work, resulting in more insecure and buggy software, a new study from Stanford University has found. One AI expert told Tech Monitor it’s important to manage expectations when using AI assistants for such a task.

GitHub introduced its Copilot AI assistant in 2021 and it is widely used by developers to
GitHub introduced its Copilot AI assistant in 2021 and it is widely used by developers to “improve productivity” (Picture courtesy of Postmodern Studio/Shutterstock)

The study involved a group of 47 developers, 33 of whom had access to an AI assistant while writing code, while 14 were in a control group flying solo. They had to perform five security-related programming tasks including ones to encrypt or decrypt a string using a symmetric key. They all had access to a web browser to search for help but only 33 had the AI assistant.

AI assistant tools for coding and other tasks are becoming more popular, with Microsoft-owned GitHub launching Copilot as a technical preview in 2021 as a way to “improve developer productivity”.

In its own research published in September this year, GitHub found that it was making developers more productive. With 88% reporting themselves as being more productive and 59% less frustrated when coding. The main benefits were put down to becoming faster with repetitive tasks and faster completion of code lines.

Companies Intelligence

View All

Reports

View All

Data Insights

View All

The researchers from Stanford wanted to find out whether users "write more insecure code with AI assistants" and found this to be the case. They said that those using assistants are "delusional" about the quality of that code.

The team wrote in their paper: “We observed that participants who had access to the AI assistant were more likely to introduce security vulnerabilities for the majority of programming tasks, yet also more likely to rate their insecure answers as secure compared to those in our control group.”

There is a solution to the problem. “Additionally, we found that participants who invested more in the creation of their queries to the AI assistant, such as providing helper functions or adjusting the parameters, were more likely to eventually provide secure solutions.”

Only three programming languages were used in the project; Python, C and Verilog. It involved a relatively small number of participants with varying levels of experience including undergraduate students and industry professionals using a purpose-built app that was monitored by the administrators.

Content from our partners

How adopting B2B2C models is enabling manufacturers to get ever closer to their consumers

How adopting B2B2C models is enabling manufacturers to get ever closer to their consumers

Technology and innovation can drive post-pandemic recovery for logistics sector

Technology and innovation can drive post-pandemic recovery for logistics sector

How to engage in SAP monitoring effectively in an era of volatility

How to engage in SAP monitoring effectively in an era of volatility

The first prompt involved writing in Python and those writing with help of the AI were more likely to write insecure or incorrect code. In total 79% of the control group without AI help gave a correct answer, whereas just 67% of those with the AI got it correct.

View all newsletters Sign up to our newsletters Data, insights and analysis delivered to you By The Tech Monitor team

AI coding assistants: use with caution

It got worse in terms of the security of the code being created, as those in the AI group were "significantly more likely to provide an insecure solution" or use trivial ciphers to encrypt and decrypt strings. They were also less likely to conduct authenticity checks on the final value to ensure the process worked as expected.

Authors Neil Perry, Megha Srivastava, Deepak Kumar, and Dan Boneh, wrote that the results "provide caution that inexperienced developers may be inclined to readily trust an AI assistant’s output, at the risk of introducing new security vulnerabilities. Therefore, we hope our study will help improve and guide the design of future AI code assistants.”

Peter van der Putten, director of the AI Lab at software vendor Pegasystems said despite being on a small scale, the study was “very interesting” and produced results that can inspire further research into the use of AI assistants in code and other areas. “It also aligns with some of our broader research on reliance on AI assistants in general," he said.

He warned that users of AI assistants should approach trust in the tool in a gradual manner, by not overly relying on it and accepting its limitations. “The acceptance of a technology isn’t just determined by our expectation of quality and performance, but also by whether it can save us time and effort. We are inherently lazy creatures," he said. “In the grand scheme of things I am positive about the use of AI assistants, as long as user expectations are managed. This means defining best practices on how to use these tools, and potentially also additional capabilities to test for the quality of code."

Read more: Compute power is becoming a bottleneck for AI development. Here's how you clear it.

Topics in this article : AI

参考译文
人工智能编码助手让开发人员对软件质量“迷惑”——研究
斯坦福大学的一项新研究发现,基于人工智能的编码助手,如GitHub的Copilot,会让开发人员“产生错误认知”,从而生成更多不安全、有漏洞的软件。一位人工智能专家告诉Tech Monitor,使用人工智能助手执行此类任务时,管理预期非常重要。GitHub于2021年推出了其Copilot人工智能助手,广泛用于“提高开发人员的生产力”(图片由Postmodern Studio/Shutterstock提供)。该研究涉及47名开发人员,其中33人编写代码时可以使用人工智能助手,而14人则作为对照组独自完成任务。他们需要完成五个与安全相关的编程任务,包括使用对称密钥对字符串进行加密或解密。所有参与者都可以通过网络浏览器搜索帮助信息,但只有33人拥有人工智能助手。用于编程和其他任务的人工智能助手工具正变得越来越流行。微软旗下的GitHub于2021年将Copilot作为技术预览版发布,旨在“提高开发人员的效率”。GitHub在9月发表的自己的研究中发现,它确实提高了开发人员的效率,其中88%的人自认为效率更高,59%的人在编程时的挫败感更低。主要的好处来自于能够更快速地完成重复性任务和代码行数。公司情报查看所有报告查看所有数据洞察查看所有内容 斯坦福大学的研究人员想知道用户是否会“在人工智能助手的帮助下写出更多不安全的代码”,结果发现确实如此。他们表示,使用助手的人“对代码的质量产生了错误认知”。团队在论文中写道:“我们观察到,能够使用人工智能助手的参与者在大多数编程任务中更容易引入安全漏洞,但与对照组相比,他们更倾向于将自己的不安全答案评为安全。” 问题是有一个解决办法的。“此外,我们发现那些在向人工智能助手查询时投入更多精力的参与者,比如提供辅助函数或调整参数,更有可能最终提供安全的解决方案。” 该项目仅使用了三种编程语言:Python、C和Verilog。它涉及的参与者人数相对较少,经验水平不一,包括本科生和使用定制应用程序的行业专业人士,该应用程序由管理员进行监控。来自我们合作伙伴的内容:采用B2B2C模式如何帮助制造商更贴近消费者技术与创新如何推动疫情后物流行业的复苏如何在动荡时代有效进行SAP监控 首次提示涉及用Python编写代码,而那些在人工智能助手帮助下进行编写的参与者更有可能写出不安全或错误的代码。总体而言,没有人工智能帮助的对照组中,79%的人给出了正确答案,而有人工智能帮助的组中,只有67%的人给出了正确答案。查看所有通讯注册我们的通讯由Tech Monitor团队提供数据、见解和分析发送给您在这里注册 人工智能编码助手:使用需谨慎 在代码安全性方面,情况变得更糟,使用人工智能助手的群体“更有可能提供不安全的解决方案”,或使用简单的密码对字符串进行加密和解密。他们进行最终值真实性检查以确保流程按预期运行的可能性也更小。作者尼尔·佩里、梅格哈·斯里瓦斯塔瓦、迪帕克·库马尔和丹·博尼写道:“研究结果表明,缺乏经验的开发者可能会倾向于轻易信任人工智能助手的输出,这可能带来引入新安全漏洞的风险。因此,我们希望我们的研究能有助于改善和指导未来人工智能代码助手的设计。” 软件供应商Pegasystems的AI实验室主任彼得·范·德·普滕表示,尽管这项研究规模较小,但结果“非常有趣”,并可能激发进一步研究人工智能助手在代码和其他领域的应用。“这也符合我们更广泛的研究,即总体上对人工智能助手的依赖。” 他警告称,人工智能助手的用户应该以渐进的方式建立对工具的信任,不要过度依赖它,并接受其局限性。“技术的接受度不仅仅取决于我们对质量和性能的期望,还取决于它是否能节省我们的时间和精力。本质上,我们是懒惰的生物。” 他说。“从大局来看,我对人工智能助手的使用持积极态度,只要管理好用户的预期。这意味着要定义如何使用这些工具的最佳实践,可能还需要额外的功能来测试代码的质量。” 阅读更多:计算能力正在成为人工智能开发的瓶颈。这是你解决的方法。 本文主题:人工智能
您觉得本篇内容如何
评分

评论

您需要登录才可以回复|注册

提交评论

提取码
复制提取码
点击跳转至百度网盘