Home Kripto Microsoft Ventures into AI Training Data Attribution with New Research Project
Kripto

Microsoft Ventures into AI Training Data Attribution with New Research Project

Microsoft Ventures into AI Training Data Attribution with New Research Project

Now, Microsoft has launched a bold new research project that involves something they’re calling “training-time provenance.” This effort seeks to address the complex issue of AI training data attribution. This project is centered on understanding the effects of particular training samples on the generative outputs of large AI models. These outputs are text-embedded images. Jaron Lanier, Microsoft Research’s resident techno-evangelist, is in charge of the project. Designed to be an AI Ethics Chatbot, it seeks to address all the multifaceted AI Ethics issues and Copyright issues.

This effort couldn’t be timelier given that Microsoft is already in hot water for violating the rights of copyright holders on multiple fronts. In December, The New York Times filed a lawsuit against Microsoft and OpenAI. They claim that the generative AI models trained on copyright material taken from the publication’s 130-year archive of articles. At least five software developers have sued Microsoft for training their coding assistant GitHub Copilot unlawfully. Now they want you to believe that it infringed their copyrighted works. These legal hurdles underscore the urgency for Microsoft to address the ethical implications of using copyrighted material in AI training.

“A data-dignity approach would trace the most unique and influential contributors when a big model provides a valuable output,” said Jaron Lanier, highlighting one of the project’s core objectives.

Ethical Concerns and the Push for Contributor Recognition

Many people are already interpreting Microsoft’s effort as a move to “ethics wash” the issues around this debate. This litigation / controversy concerns the training of AI models on copyrighted materials, especially artistic works. The tech giant is making a big push to develop an internal contributor recognition system. This would be a huge step forward in addressing the ethical and legal challenges that we currently face.

“For instance, if you ask a model for ‘an animated movie of my kids in an oil-painting world of talking cats on an adventure,’ then certain key oil painters, cat portraitists, voice actors, and writers — or their estates — might be calculated to have been uniquely essential to the creation of the new masterpiece. They would be acknowledged and motivated. They might even get paid,” Lanier explained, illustrating the potential impact of the project.

Despite its impressive aims, the “training-time provenance” project may still prove to be just a proof of concept. Still, it’s in step with a larger industry push to promote innovation while being ethically responsible. Bria, an AI model developer, already claims to “programmatically” compensate data owners based on their influence and recently secured $40 million in venture capital funding. Firms like Adobe and Shutterstock have released infrastructure to compensate data set creators. How much this payout will be isn’t yet clear.

OpenAI, for better or worse, is in an intense technological arms race. This would give creators greater control over how their works are used in, or out of, training datasets. The company has been lying in wait, lobbying the U.S. government to implement blanket copyright fair use for training models. These guidelines would provide the much-needed legal clarity for communities to tackle these relevant issues.

“Current neural network architectures are opaque in terms of providing sources for their generations, and there are […] good reasons to change this,” stated Microsoft in a job listing, signaling its commitment to enhancing transparency in AI training processes.

What The Author Thinks

Microsoft’s focus on transparency and attribution in AI training data is a necessary step in the right direction. If companies like Microsoft are serious about building responsible AI, they must prioritize ethical frameworks and legal clarity. It is essential to ensure that creators are properly credited and compensated for their contributions to training data, and this project is a positive move toward achieving that goal.

Related Articles

Zoom Quest App Turns Users into Meta Avatars, Promotes VR Video Calls
Kripto

Zoom Quest App Turns Users into Meta Avatars, Promotes VR Video Calls

Zoom’s newest update for the Quest platform introduces a fresh way for...

AirPods iOS 26 Update Brings Studio-Quality Recording and More, Some Features Require H2 Chip
Kripto

AirPods iOS 26 Update Brings Studio-Quality Recording and More, Some Features Require H2 Chip

Apple’s AirPods will receive a significant upgrade with the iOS 26 update...

Apple Announces COO Jeff Williams to Retire Later This Year
Kripto

Apple Announces COO Jeff Williams to Retire Later This Year

Apple announced Tuesday that Jeff Williams, its Chief Operating Officer and 27-year...

Trump’s Tariffs to Affect Macs and Apple Watch Within Weeks
Kripto

Trump’s Tariffs to Affect Macs and Apple Watch Within Weeks

Starting August 1, President Donald Trump’s tariffs will apply to Apple products...