Bluesky To Sell Your Content To AI Data Miners
-
Bluesky To Sell Your Content To AI Data Miners
So it begins. Hidden in Jay Graber's recent charm offensive is this innocuously framed initiative: Bluesky is weighing a proposal that gives users consent over how their data is used for AI (https://techcrunch.com/2025/03/10/bluesky-is-weighing-a-proposal-that-gives-users-consent-over-how-their-data-is-used-for-ai/)
Not so fast.
1) Shows they are planning on doing content deals with AI companies.
2) Seems like it is Opt-out vs. Opt-in (see below).
3) It is just a voluntary robots.txt fileh/t @Lydie https://tech.lgbt/@Lydie/114149023344861046
more...
-
mastodonmigration@mastodon.onlinereplied to mastodonmigration@mastodon.online last edited by
Let's get into this. #Bluesky is bleeding money and selling your data is the best way they have of "monetizing" you. So why not frame it as a "voluntary" initiative?
Thing is, seems like it will be opt-out. See this github 'proposal': https://github.com/bluesky-social/proposals/tree/main/0008-user-intents
"Suppose a Bluesky user does not want any of their public data to be used for generative AI training. They would go in to app settings, find the data reuse preferences section, and configure “Generative AI” to “disallow”.
more...
-
dogzilla@masto.deluma.bizreplied to mastodonmigration@mastodon.online last edited by
@mastodonmigration @Lydie How would I opt out of having my Mastodon posts used for training AI? Is there even any way to know if someone has set up an instance, followed thousands of people, and is feeding all the posts into an AI?
-
thenexusofprivacy@infosec.exchangereplied to dogzilla@masto.deluma.biz last edited by
Mastodon doesn’t allow you to opt out of your data used to train AI (although some instances have clauses in their terms to prohibit it). In fact if anybody from Threads is following you - or following somebody who boosts one of your posts - Threads privacy policy says your data can be used to train Meta’s AI and target ads.
So if Bluesky implements this, they’ll be providing more control than Mastodon does today. How people are getting from there to “they’re going to sell your data!!!!” is mysterious to me. Sure, opt-in would be better but Mastodon doesn’t whether have opt out!
-
mastodonmigration@mastodon.onlinereplied to thenexusofprivacy@infosec.exchange last edited by
@thenexusofprivacy @dogzilla @Lydie
Mastodon is defacto opt-out of you data being used to train AI, because no such rights are explicitly granted. The authorized uses are enumerated in the instance privacy policy and they do not include AI scaping.
Agree with you about the Threads problem and wrote about it extensively at the time.
Yes, selling content is an inference. What do you think the plan is, to simply give it away to AI scrapers? Not sure this would be better, and it makes no sense.
-
thenexusofprivacy@infosec.exchangereplied to mastodonmigration@mastodon.online last edited by thenexusofprivacy@infosec.exchange
Yes I think Bluesky’s plan is very much to make it easy for AI scrapers and everybody else to access public dsta for free. They’ve said so repeatedly, their architecture is optimized for it, it fits in with their belief system, and they have plenty of other ways of making money. Of course they could change their minds, but adding robots.txt-like consent signals doesn’t matter that any easier or more likely.
As for the situation on Mastodon, I’m not sure what privacy lawyer told you that and how much time they had spent looking at your instance’s privacy policy, but you might want to get some other expert opinions before giving that advice to others.
-
leberschnitzel@existiert.chreplied to thenexusofprivacy@infosec.exchange last edited by
@thenexusofprivacy @mastodonmigration @dogzilla @Lydie the problem with mastodon is the same as with all other "but it's free information on the internet" arguments of people training AI: There's no laws for it. And no laws doesn't mean it's legal or illegal, it means that legislation has to be made that will solve that question. And this might look very different when an instance doesn't make their local data public without an account, only through their instance, etc.
-
thenexusofprivacy@infosec.exchangereplied to leberschnitzel@existiert.ch last edited by
I certainly agree that we need to be legislation specifically around the use of data for AI training, and that it's a different situation for public data as opposed to data that's only accessible eith an account or via an API. Still, scraping data in violation of the terms of service isn't necessarily legal -- as Solove and Hartzog write in The Great Scrape, "Privacy law regularly protects publicly available data, and privacy principles are implicated even when personal data is accessible to others." Ulrike Hahn's Bridging to Bluesky: The open social web, consent, and GDPR look at the interactions between the ActivityPub Fediverse and Bluesky; the joint statement from a dozen data protection offices and Kieran Mcarthy's Web Scraping for Me, But Not for Thee look at scraping in general.
TL;DR summary: it's complex!