Nvidia Contacted Anna's Archive To Secure Access To Millions of Pirated Books
An anonymous reader quotes a report from TorrentFreak: NVIDIA executives allegedly authorized the use of millions of pirated books from Anna's Archive to fuel its AI training. In an expanded class-action lawsuit that cites internal NVIDIA documents, several book authors claim (PDF) that the trillion-dollar company directly reached out to Anna's Archive, seeking high-speed access to the shadow library data. [...] Last Friday, the authors filed an amended complaint that significantly expands the scope of the lawsuit. In addition to adding more books, authors, and AI models, it also includes broader "shadow library" claims and allegations. The authors, including Abdi Nazemian, now cite various internal Nvidia emails and documents, suggesting that the company willingly downloaded millions of copyrighted books. The new complaint alleges that "competitive pressures drove NVIDIA to piracy," which allegedly included collaborating with the controversial Anna's Archive library.
According to the amended complaint, a member of Nvidia's data strategy team reached out to Anna's Archive to find out what the pirate library could offer the trillion-dollar company "Desperate for books, NVIDIA contacted Anna's Archive -- the largest and most brazen of the remaining shadow libraries -- about acquiring its millions of pirated materials and 'including Anna's Archive in pre-training data for our LLMs,'" the complaint notes. "Because Anna's Archive charged tens of thousands of dollars for 'high-speed access' to its pirated collections [] NVIDIA sought to find out what "high-speed access" to the data would look like."
According to the complaint, Anna's Archive then warned Nvidia that its library was illegally acquired and maintained. Because the site previously wasted time on other AI companies, the pirate library asked NVIDIA executives if they had internal permission to move forward. This permission was allegedly granted within a week, after which Anna's Archive provided the chip giant with access to its pirated books. "Within a week of contacting Anna's Archive, and days after being warned by Anna's Archive of the illegal nature of their collections, NVIDIA management gave 'the green light' to proceed with the piracy. Anna's Archive offered NVIDIA millions of pirated copyrighted books." The complaint states that Anna's Archive promised to provide NVIDIA with access to roughly 500 terabytes of data. This included millions of books that are usually only accessible through Internet Archive's digital lending system, which itself has been targeted in court. The complaint does not explicitly mention whether NVIDIA ended up paying Anna's Archive for access to the data.
Additionally, it's worth mentioning that NVIDIA also stands accused of using other pirated sources. In addition to the previously included Books3 database, the new complaint also alleges that the company downloaded books from LibGen, Sci-Hub, and Z-Library. In addition to downloading and using pirated books for its own AI training, the authors allege NVIDIA distributed scripts and tools that allowed its corporate customers to automatically download "The Pile", which contains the Books3 pirated dataset.
Read more of this story at Slashdot.