In late 2025 and early 2026, one of the most dramatic developments in the history of digital music occurred when a controversial group known as Anna’s Archive claimed it had scraped, archived, and begun sharing vast amounts of data from Spotify’s streaming catalog, including 86 million music files and metadata for 256 million tracks, reportedly amounting to nearly 300 terabytes of data.
This event sparked debates across the music, tech, and legal worlds, and it raises profound questions about digital preservation, copyright, piracy, artist rights, and the future of streaming services.

What Exactly Happened?
In December 2025, Anna’s Archive, a shadowy project connected to digital preservation and “pirate library” communities, announced in a blog post that it had found a method to “scrape Spotify at scale.” This scrape allegedly resulted in a massive dataset consisting of:
Metadata for 256 million Spotify tracks, including artist names, album details, tempo, loudness, audio features, ISRC codes, and more.

Approximately 86 million audio files representing what the group claims are the most‑popular songs on the platform, accounting for around 99.6 % of all listens according to their popularity‑based sampling.
A total dataset close to 300 terabytes in size.
Anna’s Archive has characterized this effort as a ** cultural preservation project**, likening it to archiving books, research, and knowledge for future generations in case of technological loss, censorship, wars, natural disasters, or legal shutdowns of repositories.
However, there’s a significant nuance: while the group claims to have scraped all this data, independent reports and investigative journalism suggest the actual music file downloads may not have been fully released publicly yet, and the volumes circulating in some torrents are still metadata or partial subsets.
Why This Matters and Controversial?
Even if only metadata is fully available right now, the implications of such a massive archive, and the claimed audio files are immense:
Scale and Scope
Spotify hosts over 100 million tracks globally, and this scraping allegedly captured most of what people actually listen to, not necessarily the full catalog of lesser‑known or unplayed songs.
Metadata for 256 million tracks makes this possibly the largest publicly accessible music database ever, far surpassing existing sources such as MusicBrainz.
Preservation vs. Piracy
Anna’s Archive frames this as cultural preservation akin to archiving books or academic works for posterity. Critics and music industry representatives insist this is piracy, unlawful copying, and distribution of copyrighted material.
Spotify, Universal Music Group, Sony Music, Warner Music, and other labels have described the activity as unauthorized access, breaches of contract, DMCA violations, and violation of digital rights protections.

Transparent vs. Legal Complexities
Scraping content in this way almost certainly involved circumventing digital rights management (DRM) and copyright protections, legal violations in markets like the U.S. under the Digital Millennium Copyright Act (DMCA).
Music Business Worldwide
Spotify has stated publicly that it identified and disabled accounts involved in the scraping and has rolled out enhanced safeguards to prevent similar intrusions.
Current Status
Several Anna’s Archive domains have been taken down or suspended due to legal pressure, including actions stemming from U.S. courts and domain registry cooperation.
The group continues to shift domains and mirrors, a common tactic among decentralized archive communities, suggesting they intend to resist legal suppression.
Potential Uses of the Archive and Industry Concerns
Even if the archive remains mostly metadata for now, this dataset has value far beyond individual downloads:
For Researchers and Creators: Musicologists and analysts can query tempo (BPM), loudness, genre distributions, and other measurable attributes across millions of songs — potentially fueling new academic research into cultural trends.
Music Tech Solutions: AI developers and models could, in theory, use this data to train generative or analytical systems, although this raises ethical and legal concerns.
DJs and music producers might derive new insights for playlist curation, remixing, or trend segmentation by comparing audio features across eras and genres.
For Fans and Casual Users: Some music lovers see this as a resource that might someday enable offline access or comprehensive archival listening, although direct easy streaming of individual songs from the torrent archive is currently not simple or practical.
Others argue that pirated downloads of music already available through subscription services do not meaningfully threaten Spotify’s business model, but the legal and ethical implications differ widely.
Industry and Legal Backlash
Music industry stakeholders have reacted strongly:
- Spotify reiterates it considers these actions unauthorized and is committed to protecting both artist rights and its platform from piracy.
- Labels and streaming partners have joined or supported legal efforts to halt publication and distribution of the scraped data, often citing harm to artists, rights management, and millions of dollars at stake each year in streaming revenue.
- Legal experts note that labeling this effort as “preservation” does not circumvent copyright law, and that downloading or redistributing copyrighted recordings is illegal in many jurisdictions.
Despite this, Anna’s Archive has managed to evade permanent takedown so far by using alternative domains and decentralized hosting approaches common among shadow library projects.
What This Means for the Future of Music
This event highlights several trends and questions shaping the digital music era:
- Can digital cultural artifacts be preserved without violating copyright law?
- Will decentralized archives and torrent networks expand beyond books to include multimedia?
- How should streaming platforms balance accessibility with digital rights protection?
- Could industry players develop shared archival standards that protect both creators and cultural heritage?
The debate is now at the intersection of technology, ethics, law, and culture, and its outcome may influence how we think about digital heritage preservation vs. intellectual property protections for years to come.