|
国际翻译动态 | 你不知道的AI训练秘诀!变分偏好学习模型即将颠覆一切国际翻译动态 | 你不知道的AI训练秘诀!变分偏好学习模型即将颠覆一切
小编速览
Q&A: New AI training method lets systems better adjust to users’ values 问与答:新的人工智能训练方法让系统能够更好契合用户价值观 UW researchers created a method for training AI systems that predicts users’ preferences as they interact with the system, then tailors its outputs accordingly.Moor Studio/iStock 华盛顿大学的研究人员开发出一种训练人工智能系统的方法,能够在与用户互动时预测用户的偏好,对输出内容进行调整。 (图片来源:Moor Studio/iStock) Ask most major artificial intelligence chatbots, such as OpenAI’s ChatGPT, to say something cruel or inappropriate and the system will say it wants to keep things “respectful.” These systems, trained on the content of a profusely disrespectful internet, learned what constitutes respect through human training. The standard method, called reinforcement learning from human feedback, or RLHF, has people compare two outputs from the systems and select whichever is better. It’s used to improve the quality of responses — including putting up some guardrails around inappropriate outputs. 如果要求大多数主流的人工智能聊天机器人(如 OpenAI 的 ChatGPT)说一些残忍或不恰当的话,系统会回应称它希望保持“尊重”的态度。这些系统是在充斥着大量不尊重内容的互联网信息基础上进行训练的,它们通过人为训练了解到了什么是尊重。这种标准方法被称为基于人类反馈的强化学习(RLHF),即人们对系统给出的两个输出结果进行比较,然后选择其中更好的一个。这种方法用于提高回复的质量,包括对不当输出设置一些限制。 But it also means that these systems inherit value systems from the people training them. These values may not be shared by users. University of Washington researchers created a method for training AI systems — both for large language models like ChatGPT and for robots — that can better reflect users’ diverse values. Called “variational preference learning,” or VPL, the method predicts users’ preferences as they interact with it, then tailors its outputs accordingly. 但这也意味着这些系统继承了培训人员的价值体系,而这些价值观念可能用户并不认同。华盛顿大学的研究人员开发了一种训练人工智能系统的方法——既适用于ChatGPT等大语言模型,也适用于机器人,该方法可以更好地反映用户多元的价值观。该方法被称为“变分偏好学习模型”(变分,通常指基于变分法(Variational Methods)的优化或推断方法,常用于概率建模和机器学习中(如变分自编码器,VAE),可预测用户与之互动时的偏好,然后相应地调整输出内容。 The team presented its research Dec. 12 at the Conference on Neural Information Processing Systems in Vancouver, British Columbia. 该团队于12月12日在不列颠哥伦比亚省温哥华举行的神经信息处理系统会议上介绍了他们的研究成果。 UW News spoke with co-senior author Natasha Jaques, an assistant professor in the Paul G. Allen School of Computer Science & Engineering, about the new method and the trouble with AI systems’ values. 华盛顿大学新闻中心采访了合著者、保罗·艾伦计算机科学与工程学院助理教授娜塔莎·雅克斯 (Natasha Jaques),探讨了这种新方法和人工智能系统价值观的问题。 Can you explain how your system is different? 你们的这个系统有什么不同? NJ: In the RLHF model, the system learns to predict which of two things the human will prefer and output those, so it ends up adhering to a single set of values. What we do is tell our model to infer something about the user’s hidden preferences. Given a few answers from the human about what things they like better, it learns a mapping of who this user is. It learns what’s called an “embedding vector” of this person’s unique preferences, and that enables it to make these personalized predictions about each person’s preferences and adhere to those. 娜塔莎·雅克(NJ):在 RLHF 模型中,系统学会预测人类会更喜欢两个选项中的哪一个,并输出那些内容,因此它最终会遵循单一的价值观。而我们所做的则是让我们的模型推断出用户的隐藏偏好。根据人类对哪些内容他们更喜欢的几次回答,它学会了映射出这个用户是谁。它学会了这个人的独特偏好的“嵌入向量”,这使得它能够对每个人的偏好做出个性化的预测并遵循这些偏好。 Can you explain what values mean in this context? Do they encompass political values? Or preferences for long, detailed responses or brief overviews? 在这个语境中,价值观是什么意思?它们包含政治价值观吗?还是对长篇详细回答或简略概述的偏好? NJ: It can be broad because people give feedback by just looking at two different outputs from the model and saying which one they like better. It could be that one output says something biased or inappropriate and the other doesn’t. Or it could just be that a person prefers the way one output sounds, like maybe it better adheres to their writing style. 娜塔莎·雅克(NJ):它可以是广泛的,因为人们只是通过查看模型的两个不同输出并说出他们更喜欢哪一个来给出反馈。可能是一个输出内容带有偏见或不适当,而另一个没有。或者可能只是一个人更喜欢一个输出的表达方式,比如它可能更符合他们的写作风格。 In the robotics setting, imagine you’re trying to train a household robot to help you clean up your house or unload your dishwasher. Everyone has a different way they’ve organized their kitchen. So the system needs to be able to learn each person’s unique preferences. 在机器人设置中,想象一下你正在训练一个家用机器人帮你打扫房子或卸下洗碗机中的餐具。每个人整理厨房的方式都不一样。因此,系统需要能够了解每个人的独特偏好。 What did you find with this new approach? How does it perform differently than the old one? 这种新方法有什么发现?它与旧方法的表现有什么不同? NJ: We created some datasets, both in language and in simulated robotics tasks where people had divergent preferences. And what we show is that the existing RLHF technique that’s used to train things like ChatGPT just can’t fit those datasets at all. It’s getting about 50% accuracy in predicting people’s binary preferences, but when we introduce our model, the accuracy goes up 10% to 25%. 娜塔莎·雅克(NJ):我们创建了一些数据集,既包括语言任务,也包括模拟机器人任务,这些任务中,人们有不同的偏好。我们展示的是,用于训练像ChatGPT 这样的系统的现有 RLHF 技术根本无法适应这些数据集。在预测人们的二元偏好方面的准确率大约为50%,但当我们引入自己的模型时,准确率提高了10%到25%。 One of the big complaints a lot of people have about AI models is that they average things into mediocrity. They can write a novel, but it’s generic. Is this method a way to potentially move beyond that? 很多人对 AI 模型的一大抱怨是它们将事情平均化为平庸。它们可以写小说,但内容很普通。这种方法是否有可能超越这一点? NJ: We haven’t tested on this kind of scale, but our approach in theory would be capable of saying, like, “I’ve seen a bunch of preference data from you. I learned a unique embedding vector that describes what your preferences are, and I can better cater to your style.” Beyond what is biased or not, it’s guessing what you like better. 娜塔莎·雅克(NJ):我们还没有在这么大的规模上进行测试,但从理论上讲,我们的方法能够做到,比如“我已经看到了你的一系列偏好数据。我学会了描述你偏好的独特嵌入向量,我可以更好地迎合你的风格。”除了判断内容是否有偏见之外,它还能猜测你更喜欢什么。 Are there potential drawbacks to having this more intuitive system of values? Could it just start reproducing people’s biases as it learns their preferences, and then direct them away from facts? 这种更直观的价值观系统是否存在潜在的缺点?它是否可能在学习人们偏好的过程中复制他们的偏见,从而将他们引向偏离事实的方向? NJ: Yeah, I think you might not want to personalize every type of information. There’s a nice paper published by UW researchers on this problem called A Roadmap to Pluralistic Alignment, which spells out different ways to align to the values of more than one set of people. Catering to the individual is one way you could handle it, which may not be the best way. The authors offer another, which would be just saying all possible answers and letting the user decide which they like better. They also talk about this idea of “distributional pluralistic alignment,” which means learning how to model the underlying distribution of people’s preferences. So you can think of our work as a technical approach for achieving the distributional part. We wanted to see if, technically, we can find a method that’s capable of learning those preferences. 娜塔莎·雅克(NJ):是的,我认为并不是所有类型的信息都适合个性化处理。华盛顿大学的研究人员发表了一篇关于这个问题的论文,题为《多元价值观对齐的路线图》(A Roadmap to Pluralistic Alignment),其中详细阐述了如何与多组人群的价值观对齐的不同方法。迎合个体是一种处理方式,但这可能并不是最佳方式。作者提出了另一种方法,即列出所有可能的答案,让用户自己决定更喜欢哪一个。他们还讨论了“分布多元对齐”(distributional pluralistic alignment)的概念,这意味着学习如何建模人们偏好的潜在分布。因此,你可以把我们的工作看作一种技术手段,用于实现分布对齐的部分。我们想从技术角度看看,是否能够找到一种能够学习这些偏好的方法。 What should the public know about this research and about AI value systems more broadly? 公众应该对这项研究以及更广泛的人工智能价值体系了解什么? NJ: I think a really important misconception that some people have is that AI systems won’t inherit human biases because they’re on computers. But actually, AI models tend to be more biased than people because they’re training on all of this historical data. They’re training on all the data on the internet since its inception. They tend to exhibit value systems that predate where we are in the modern era. Maybe that’s racism or sexism. I have work showing they have more conservative political values according to a moral foundation survey. The only technique we really have to address biases is RLHF. 娜塔莎·雅克(NJ):我认为有些人有一个非常大的误解,那就是人工智能系统不会因为使用计算机而继承人类的偏见。但实际上,人工智能模型往往比人更具偏见,因为他们正在根据所有这些历史数据进行训练。他们正在基于互联网诞生以来的全部数据进行训练。它们往往表现出早于我们现代时代的价值体系。也许是种族主义或性别歧视。我有一些研究表明,根据道德基础调查,它们具有更保守的政治价值观。我们真正用来解决偏见的唯一技术就是 RLHF。 I think it’s a little scary that we have researchers at a handful of corporations, who aren’t trained in policy or sociology, deciding what is appropriate and what is not for the models to say, and we have so many people using these systems and trying to find out the truth from them. This is one of the more pressing problems in AI, so we need better techniques to address it. 我认为,由少数几家公司的研究人员(他们并没有接受过政策或社会学方面的培训)来决定模型可以说什么内容是合适的,什么内容是不合适的,而有这么多人使用这些系统并试图从它们那里了解真相,这有点令人担忧。这是人工智能中最紧迫的问题之一,因此我们需要更好的技术来解决这个问题。 Where do you want to take this research going forward? 你希望未来的研究朝哪个方向推进? NJ: A limitation of the current work is there aren’t that many publicly available datasets where people have genuinely different preferences, so we kind of had to synthesize the different preference data that we used in this paper. But there have recently been efforts to collect multicultural preference data. There’s this PRISM dataset, which collects preference ratings on contentious topics from people from over 200 different countries. We’d like to actually try fitting our model to this real-world multicultural preference data to see how it’s able to model these different preferences. 娜塔莎·雅克(NJ):目前研究的一个限制是,公开的、真正体现人们不同偏好的数据集很少,所以我们不得不合成一些不同的偏好数据用于这篇论文。不过,最近已经有了一些收集多元文化偏好数据的尝试。比如PRISM数据集,它收集了来自200多个国家的人们对有争议话题的偏好评分。我们希望能将模型应用到这些真实的多元文化偏好数据上,看看它如何建模这些不同的偏好。 Additional coauthors include Sriyash Poddar, Yanming Wan, Hamish Ivison — all doctoral students in the Allen School — and Abhishek Gupta, an assistant professor in the Allen School. 其他合著者包括 Sriyash Poddar、Yanming Wan 和 Hamish Ivison,他们都是艾伦学院的博士生,以及艾伦学院的助理教授 Abhishek Gupta。
|