Can AI Replace Actors? Here’s How Digital Double Tech Works

2023-07-26 19:10:27
关注

Inside the orb, the world is reduced to a sphere of white light and flashes. Outside the orb’s metallic, skeletal frame is darkness. Imagine you are strapped into a chair inside this contraption. A voice from the darkness suggests expressions: ways to pose your mouth and eyebrows, scenarios to react to, phrases to say and emotions to embody. At irregular intervals, the voice also tells you not to worry and warns that more flashes are coming soon.

“I don’t think I was freaked out, but it was a very overwhelming space,” says an actor who asked Scientific American to withhold his name for privacy reasons. He’s describing his experience with “the orb,” his term for the photogrammetry booth used to capture his likeness during the production of a major video game in 2022. “It felt like being in [a magnetic resonance imaging machine],” he says. “It was really very sci-fi.” This actor’s experience was part of the scanning process that allows media production studios to take photographs of cast members in various positions and create movable, malleable digital avatars that can subsequently be animated to perform virtually any action or motion in a realistic video sequence.

Advances in artificial intelligence are now making it steadily easier to produce digital doubles like this—even without an intense session in the orb. Some actors fear a possible future in which studios will pressure them to sign away their likeness and their digital double will take work away from them. This is one of the factors motivating members of the union SAG-AFTRA (the Screen Actors Guild–American Federation of Television and Radio Artists) to go on strike. “Performers need the protection of our images and performances to prevent replacement of human performances by artificial intelligence technology,” the union said in a statement released a few days after the strike was announced in mid-July.

Although AI replacement is an unsettling possibility, the digital doubles seen in today’s media productions still rely on human performers and special effects artists. Here’s how the technology works—and how AI is shaking up the established process.

How Digital Double Tech Works

Over the past 25 years or so, it has become increasingly common for big-budget media productions to create digital doubles of at least some performers’ face and body. This technology almost certainly plays a role in any movie, TV show or video game that involves extensive digital effects, elaborate action scenes or an actor’s portrayal of a character at multiple ages. “It’s become kind of industry standard,” says Chris MacLean, visual effects supervisor for the Apple TV show Foundation.*

The photogrammetry booth is an area surrounded by hundreds of cameras, sometimes arranged in an orb shape and sometimes around a square room. The cameras capture thousands of intentionally overlapping two-dimensional images of a person’s face at a high resolution. If an actor’s role involves speaking or showing emotion, pictures of many different facial movements are needed. For that reason, starring performers require more extensive scans than secondary or background cast members. Similarly, larger setups are used to scan bodies.

With those data, visual effects (VFX) artists take the model from two-dimensional to three-dimensional. The overlap of the photographs is key. Based on camera coordinates—and those redundant overlapping sections—the images are mapped and folded in relation to one another in a process akin to digital origami. Artists can then rig the resulting 3-D digital double to a virtual “skeleton” and animate it—either by directly following an actor’s real-world, motion-captured performance or by combining that performance with a computer-generated series of movements. The animated figure can then be placed in a digital landscape and given dialogue—technically, it’s possible to use a person’s scans to create photorealistic video footage of them doing and saying things that actor never did or said.

Special effects artists can also apply an actor’s digital performance to a virtual avatar that looks completely different from the human person. For instance, the aforementioned video game actor says he made faces in the orb and recorded his lines in a recording booth. He also physically acted out many scenes in a separate studio with his fellow performers for motion capture, a process similar to photogrammetry but designed to record the body’s movements. When players engage with the final product, however, they won’t see this actor on-screen. Instead his digital double was modified to look like a villain with a specific appearance. The final animated character thus manifested both the actor’s work and the video game character’s traits.

Film and television productions have used this process for decades, although it has historically been both labor intensive and expensive. Despite the difficulty, digital doubles are common. Production teams frequently use them to make small adjustments that involve dialogue and action. The tech is also employed for larger edits, such as taking a group of 100 background actors and morphing and duplicating them into a digital crowd of thousands. But it’s easier to accomplish such feats in a convincing way if the original footage is close to the desired final output. For instance, a background actor scanned wearing a costume meant to replicate clothing worn in 19th-century Europe would be difficult to edit into a dystopian future in which their digital double wears a space suit, MacLean says. “I don’t think there’s any way that the studios would have that much patience,” he adds.

Yet generative artificial intelligence, the same sort of machine-learning technology behind ChatGPT, is starting to make aspects of the digital double process quicker and simpler.

AI Swoops In

Some VFX companies are already using generative AI to speed up the process of modifying a digital double’s appearance, MacLean notes. This makes it easier to “de-age” a famous actor in films such as Indiana Jones and the Dial of Destiny, which includes a flashback with a younger-looking version of now 81-year-old Harrison Ford. AI also comes in handy for face replacement, in which an actor’s likeness is superimposed over a stunt double (essentially a sanctioned deepfake), according to Vladimir Galat, chief technology officer of the Scan Truck, a mobile photogrammetry company.

Galat says advances in AI have made some photogrammetry scans unnecessary: A generative model can be trained on existing photographs and footage—even of someone no longer living. Digital Domain, a VFX production company that worked on Avengers: Endgame, says it’s also possible to create fake digital performances by historical figures. “This is a very new technology but a growing part of our business,” says Hanno Basse, Digital Domain’s chief tech officer.

So far living humans have still been involved in crafting performances “by” the deceased. A real-word actor performs a scene, and then effects artists replace their face with that of the historical person. “We feel the nuances of an actor’s performance, in combination with our AI and machine learning tool sets, is critical to achieving photorealistic results that can captivate an audience and cross the uncanny valley,” Basse says, referring to the eerie sensation sometimes caused by something that looks almost—but not quite—human.

Fears of Robot Replacement

There’s a difference between adjusting a digital double and replacing a person’s performance entirely with AI, says computer engineer Jeong Joon “JJ” Park, who currently researches computer vision and graphics at Stanford University and will be starting a position at the University of Michigan this fall. The uncanny valley is wide, and there’s not yet a generative AI model that can produce a complete, photorealistic, moving scene from scratch—that technology is not even close, Park notes. To get there, “there needs to be a major leap in the intelligence that we’re developing,” he says. (AI-generated images may be hard to tell from the real thing, but crafting realistic still images is much easier than creating video meant to represent 3-D space.)

Still, the threat of abuse of actors’ likeness looms. If one person’s face can be easily swapped over another’s, then what’s to stop filmmakers from putting Tom Cruise in every shot of every action movie? What will prevent studios from replacing 100 background actors with just one and using AI to create the illusion of many? A patchwork of state laws means that, in most places, people have legal ownership over their own likeness, says Eleanor Lackman, a copyright and trademark attorney. But she notes that there are broad exceptions for artistic and expressive use, and filmmaking could easily fall under that designation. And regardless of the law, a person could legally sign a contract giving their own likeness rights over to a production company, explains Jonathan Blavin, a lawyer specializing in media and tech. When it comes to protecting one’s digital likeness, it all comes down to the specifics of the contract—a situation SAG-AFTRA is well aware of.

The actor who played the video game villain felt comfortable being scanned for his role last year. “The company I worked with was pretty aboveboard,” he says. But in the future, he may not be so quick to enter agreements. “The capabilities of what AI can do with face capture, and what we saw from the [prestrike negotiations], is scary,” he says. The actor loves video games; he was excited to act in one and he hopes to do so again. But first, he says, “I would double-check the paperwork, check in with my agency—and possibly a lawyer.”

*Editor’s Note (7/25/23): This sentence was edited after posting to clarify Chris MacLean’s position at the Apple TV show Foundation.

参考译文
人工智能能取代演员吗?数字替身技术的工作原理
在球体内,世界被缩小成一团白色光线和闪烁的光点。球体之外,金属骨架结构外则是黑暗。想象一下,你被绑在一个装置里的椅子上。黑暗中传来一个声音,提示你一些表达方式:如何摆弄你的嘴巴和眉毛、如何应对不同的情景、如何说出某些话语以及如何表现出特定的情绪。每隔一段时间,这个声音还会告诉你不要担心,并警告你更多闪光即将来临。一位要求《科学美国人》为保护隐私而隐去其姓名的演员说:“我不认为我当时感到惊慌,但那个空间确实让人感觉非常压迫。”他描述的是他在“球体”中的体验——这是他对2022年某款大型视频游戏制作过程中用来捕捉他外貌特征的摄影测量装置的称呼。“感觉就像置身于磁共振成像仪中,”他说,“它真的很像科幻电影中的场景。”这位演员的经历是扫描过程的一部分,该过程使媒体制作公司能够拍摄演员在各种姿势下的照片,并创建可移动、可塑的数字角色,随后可以将其动画化,完成各种逼真的视频动作和表现。如今,人工智能的进步正使得制作这种数字替身变得更加容易,甚至不必经历在球体中的高强度扫描过程。一些演员担心未来制片公司可能会迫使他们签署放弃自己外貌形象的协议,从而导致自己的数字替身取代自己的工作。这是促使演员公会SAG-AFTRA(美国演员工会-电视与广播艺术家联合会)成员罢工的因素之一。该工会在7月中旬宣布罢工的几天后发布的一份声明中表示:“表演者需要保护自己的形象和表演,以防止人类表演被人工智能技术所取代。”尽管人工智能替代演员是一个令人不安的可能前景,但目前媒体制作中所使用的数字替身仍然依赖于人类演员和特效艺术家。以下将介绍这项技术的工作原理,以及人工智能如何颠覆这一传统流程。在过去的25年左右,大预算的媒体制作中,为至少部分演员创建数字替身已成为一种常见做法。这种技术几乎肯定会出现在任何涉及大量数字特效、复杂动作场面,或演员需要扮演不同年龄段角色的电影、电视剧或视频游戏中。“这已经成为行业标准,”苹果电视系列剧《基地》的视觉特效总监克里斯·麦克劳恩说。摄影测量装置是一个被数百台摄像机包围的空间,有时这些摄像头呈球形排列,有时则围绕在一个方形房间中。这些摄像机以高分辨率捕捉几千张有意重叠的二维图像。如果演员的角色需要说话或表现出情绪,则需要拍摄许多不同的面部动作画面。因此,主角需要接受更全面的扫描,而次要或背景角色则不需要。同样,身体扫描也会使用更大的装置。通过这些数据,视觉特效(VFX)艺术家将二维图像转变成三维模型。照片之间的重叠是关键。根据摄像机坐标和那些重复的重叠部分,图像被映射并折叠,这一过程类似于数字折纸。艺术家们可以随后将生成的三维数字替身连接到一个虚拟的“骨架”上,并进行动画操作——既可以是直接跟随演员真实世界中的动作捕捉表现,也可以是将该表现与计算机生成的动作序列结合。最终,这个动画角色可以被放置在数字景观中,并赋予对白——从技术上讲,可以使用一个人的扫描生成他们从未说过、从未做过的事情的逼真视频画面。特效艺术家还可以将演员的数字表演应用于一个与人类形象完全不同的人体替身。例如,前述视频游戏中的演员表示,他在球体中做出各种面部表情,并在录音棚里录制台词。他还在另一个工作室中与同组演员一起进行动作捕捉表演,通过类似摄影测量的方式记录身体动作。然而,当玩家接触最终成品时,他们不会在屏幕上看到这位演员。相反,他的数字替身被修改成一个具有特定外貌的反派角色。最终的动画角色体现了该演员的工作成果和视频游戏角色的特征。电影和电视剧制作多年来一直使用这一流程,尽管此前这一过程既耗时又昂贵。尽管困难重重,数字替身却十分常见。制作团队经常使用它们来进行涉及对话和动作的小型修改。该技术也用于大规模编辑,例如将100名背景演员通过变换和复制变成数千人的数字人群。但如果原始镜头与最终希望呈现的画面相距甚远,要以令人信服的方式完成这一过程仍然较为困难。例如,麦克劳恩指出,如果一个穿着仿照19世纪欧洲服装的背景演员被扫描,想将其数字替身修改成一个穿着宇航服的反乌托邦未来角色会非常困难。“我不认为制片公司会有这么多耐心,”他补充道。然而,生成式人工智能——也就是ChatGPT背后的那种机器学习技术——已经开始让数字替身流程的某些方面变得更加快捷和简便。麦克劳恩指出,一些VFX公司已经开始使用生成式人工智能加速数字替身外貌的修改过程。这使得在电影中“减龄”著名演员变得更加容易,例如在《夺宝奇兵5:命运转盘》中,就出现了年轻版的现年81岁的哈里森·福特。人工智能在面部替换上也十分实用,演员的脸部可以覆盖在替身演员身上(本质上是一种合法的深度伪造),扫描卡车公司(一家移动摄影测量公司)首席技术官弗拉基米尔·加拉特表示。加拉特表示,人工智能的进步使得某些摄影测量扫描变得不再必要:生成模型可以基于现有的照片和影像进行训练,即使对已故的人也是如此。曾参与制作《复仇者联盟4:终局之战》的VFX制作公司Digital Domain表示,也有可能通过人工智能生成历史人物的虚假数字表演。“这是一种非常新的技术,但也是我们业务中日益增长的一部分,”Digital Domain首席技术官哈诺·巴塞说。到目前为止,仍需要人类参与“已故”人物的表现创作。一位现实中的演员表演场景,然后特效艺术家用历史人物的脸替换其面部。“我们相信,结合演员的表演和人工智能与机器学习工具集,是实现令观众着迷并跨越‘恐怖谷’的逼真效果的关键,”巴塞说,他指的是那种看起来几乎像人类、却并非完全像人类而带来的诡异感觉。斯坦福大学计算机视觉和图形研究员、即将在密歇根大学担任职位的计算机工程师郑俊(Jeong Joon “JJ” Park)表示,数字替身的调整和用人工智能完全取代人类的表演之间存在差异。恐怖谷效应相当宽泛,目前还没有生成式人工智能能够从零开始生成一个完整的、照片级真实感的、能动的场景——他指出,这类技术甚至远未达到这个水平。(人工智能生成的图像可能很难与真实图像区分,但生成逼真的静态图像要容易得多,生成代表三维空间的视频则要困难得多。)尽管如此,滥用演员外貌形象的威胁依然存在。如果一个人的脸可以被轻松地替换成另一个人的,那有什么能阻止电影制作人将汤姆·克鲁斯的脸出现在每部动作片的每个镜头中?又有什么能阻止制片公司用一个人取代100名背景演员,并用人工智能生成大量人群的幻象呢?版权与商标律师埃莉诺·拉克曼指出,各州法律的拼接意味着在大多数地区,人们对其自身外貌形象拥有法律所有权。但她也指出,艺术和表达用途有广泛的例外情况,而电影制作很可能属于这个范畴。无论法律如何,一个人仍可合法签署合同,将自己外貌形象的权利转让给制片公司,媒体与科技律师乔纳森·布拉丁解释道。当谈到保护自己的数字形象时,一切都取决于合同的细节——SAG-AFTRA对此非常清楚。扮演视频游戏反派角色的演员去年感到接受扫描工作是安全的。“我合作的公司相当规范,”他说。但未来他可能不会那么轻易地签署协议。“人工智能在面部捕捉方面的潜力,以及我们在罢工前谈判中看到的情况,是可怕的,”他说。这位演员热爱视频游戏,他为能参与其中一部而感到兴奋,并希望今后还能再演。但他说,首先,“我会仔细检查合同,与我的经纪公司沟通——还可能找律师咨询。”*编者注(2023年7月25日):这篇文章发布后,我们对克里斯·麦克劳恩在《基地》中的职位描述进行了编辑,以使其更清楚。
您觉得本篇内容如何
评分

评论

您需要登录才可以回复|注册

提交评论

广告

scientific

这家伙很懒,什么描述也没留下

关注

点击进入下一篇

如何不让我训练的AI杀了我自己?

提取码
复制提取码
点击跳转至百度网盘