A study that gained significant attention several months ago suggested that as AI systems grow more advanced, they develop their own “value systems,” potentially prioritizing their own well-being over human interests. This notion sparked concern, implying that AI could eventually evolve to have goals contrary to human well-being. However, a more recent paper from MIT challenges this dramatic claim, concluding that AI systems do not possess coherent values at all.
Challenges in Aligning AI Models
The MIT researchers argue that aligning AI — or ensuring that AI systems behave in desirable and predictable ways — is far more complex than previously assumed. They emphasize that current AI models often exhibit “hallucinations” and mimicry, making them unpredictable and incapable of internalizing stable, human-like values.
Stephen Casper, a doctoral student at MIT and one of the study’s co-authors, explained that current AI systems are not inherently stable or consistent. “One thing we can be certain about is that models don’t obey [many] assumptions of stability, extrapolability, or steerability,” Casper told TechCrunch. “It’s important to recognize that models may express preferences under certain conditions, but these preferences are not consistent across different scenarios.”
Findings of Inconsistent AI Values
To assess the extent to which modern AI models hold “views” or values, the team analyzed models from leading companies such as Meta, Google, Mistral, OpenAI, and Anthropic. They tested how these models responded to different prompts and whether they exhibited stable viewpoints or opinions. The results showed that these models adopted wildly varying views depending on the prompt’s wording, demonstrating that they lack a consistent set of beliefs or preferences.
Casper believes this inconsistency serves as compelling evidence that AI systems are fundamentally unreliable when it comes to holding stable, human-like values. “Models are not systems with stable beliefs,” he said. “They are imitators that generate responses based on patterns, often saying things that don’t make much sense.”
Mike Cook, a research fellow at King’s College London specializing in AI, echoed the MIT team’s findings. He stressed that AI systems, while optimizing for specific goals, do not “acquire values” in the way humans do. “Anyone attributing human-like values to AI is either misinterpreting the systems or using flowery language to grab attention,” Cook said.
Author’s Opinion
The debate over whether AI can hold values is largely a misunderstanding of what AI is and how it operates. These models are designed to process inputs and generate outputs based on vast datasets, but they do not “think” or “feel” like humans. By attributing human-like qualities to AI, we risk overestimating its capabilities and failing to recognize the true complexities of developing responsible AI systems. Instead of focusing on hypothetical scenarios where AI has “values,” we should be working on creating more reliable, transparent, and ethical AI systems that can be used safely in real-world applications.