Drunk & Root@sh.itjust.works to Selfhosted@lemmy.worldEnglish · 2 days agoHow to combat large amounts of Ai scrapersmessage-squaremessage-square53fedilinkarrow-up1117arrow-down14file-text
arrow-up1113arrow-down1message-squareHow to combat large amounts of Ai scrapersDrunk & Root@sh.itjust.works to Selfhosted@lemmy.worldEnglish · 2 days agomessage-square53fedilinkfile-text
everytime i check nginx logs its more scrapers then i can count and i could not find any good open source solutions
minus-squaredaniskarma@lemmy.dbzer0.comlinkfedilinkEnglisharrow-up2·edit-27 hours agoDo you have a proper robots.txt file? Do they do weird things like invalid url, invalid post tries? Weird user agents? Millions of times by the same ip sound much more like vulnerability proving than crawler. If that’s the case fail to ban or crowdsec. Should be easy to set up a rule to ban an inhumane number of hits per second on certain resources.
minus-squareDrunk & Root@sh.itjust.worksOPlinkfedilinkEnglisharrow-up1·6 hours agosince its the frontends i run getting scraped its the robots.txt included there
Do you have a proper robots.txt file?
Do they do weird things like invalid url, invalid post tries? Weird user agents?
Millions of times by the same ip sound much more like vulnerability proving than crawler.
If that’s the case fail to ban or crowdsec. Should be easy to set up a rule to ban an inhumane number of hits per second on certain resources.
since its the frontends i run getting scraped its the robots.txt included there