Featured image: Depositphotos
Cross-posted from petapixel.com [by Jaron Schneider]
A new bill introduced by Rep. Adam Schiff would require all AI companies to disclose the copyrighted works used in training sets or face a fine.
At present, the transparency of what exactly is used to train generative AI — specifically image generators — is murky at best. Few companies explicitly state what was used to train data and even when that information is shared, it is often obscured or poorly explained.
For example, during an interview with the Wall Street Journal last month, OpenAI’s CTO Mira Murati feigned ignorance or dodged the question entirely when it was posed to her directly.
“We used publicly available data and licensed data… If they were publicly available, publicly available to use, there might be that data, but I’m not sure. I’m not confident about it,” Murati said at the time.
“I’m just not going to go into the details of the data that was used, but it was publicly available or licensed data.”
OpenAI has not clarified further since that interview, but if Rep. Schiff’s bill were to pass, she, OpenAI, and every other generative AI company would be forced to provide that information to the public.