AnActOfCreation@programming.dev to Technology@lemmy.worldEnglish · 2 years ago‘Reddit can survive without search’: company reportedly threatens to block Googlewww.theverge.comexternal-linkmessage-square368fedilinkarrow-up11.25Karrow-down124cross-posted to: [email protected][email protected][email protected]
arrow-up11.23Karrow-down1external-link‘Reddit can survive without search’: company reportedly threatens to block Googlewww.theverge.comAnActOfCreation@programming.dev to Technology@lemmy.worldEnglish · 2 years agomessage-square368fedilinkcross-posted to: [email protected][email protected][email protected]
minus-squareonline@lemmy.mllinkfedilinkEnglisharrow-up8arrow-down2·edit-22 years agoSpeaking of this, what parts of the fediverse have added the option to block training generative AI to their respective robots.txt? https://blog.google/technology/ai/an-update-on-web-publisher-controls/ https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers https://techcrunch.com/2023/09/28/medium-hints-at-a-nascent-media-coalition-to-block-ai-crawlers/ It looks like there’s a handful of these lines you’d have to add to robots.txt Is there anywhere that keeps a comprehensive list of these?
minus-squarekingthrillgore@lemmy.mllinkfedilinkEnglisharrow-up2arrow-down1·2 years agoI’ve been trying to find a list as well to no avail. The ones I do know are on my own robots.txt, at volcanolair.co/robots.txt
minus-squareonline@lemmy.mllinkfedilinkEnglisharrow-up1arrow-down1·2 years agoSomeone should make a github just to make it easier for people to find them all in one place with sources and update the list as we get new ones.
Speaking of this, what parts of the fediverse have added the option to block training generative AI to their respective robots.txt?
https://blog.google/technology/ai/an-update-on-web-publisher-controls/ https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers https://techcrunch.com/2023/09/28/medium-hints-at-a-nascent-media-coalition-to-block-ai-crawlers/
It looks like there’s a handful of these lines you’d have to add to robots.txt
Is there anywhere that keeps a comprehensive list of these?
I’ve been trying to find a list as well to no avail. The ones I do know are on my own robots.txt, at volcanolair.co/robots.txt
Someone should make a github just to make it easier for people to find them all in one place with sources and update the list as we get new ones.