7 Steps to Stop RSS Feed Scraping and Content Theft

How can you stop RSS feed scraping with content republishing of your blog and prevent content theft to avoid duplicate content penalties, or getting scraped content ranking higher in search results than your own blog article. All bloggers experience RSS scraping to some extent and it is simply not possible to go after every blog with a DMCA takedown request.

Stop RSS Scraping & Content Theft

content theft

You can keep complaining, but here are some simple ways we use to deal with scraper sites engaging in your content theft.

1. Display RSS Feed Copyright Notice

To deal with sites republishing RSS, we created the Simple feed copyright WordPress plugin which adds a copyright notice as well as links back to the blog url and permalink of the post. This plugin has already been downloaded thousands of times from the official WordPress Plugins repository.

This way republished content not only gets you backlinks, but also lets Googlebot and readers of the scraper site know that the original content was from your site.

2. Find Scraped Content on Google Search

Of course the easiest way is to search for your latest post title within quotes on Google and all search results which have the same title will be detected. You can ignore the many sites which syndicate your headlines, but you MUST find the sites which republish your content.

If you want a more powerful content theft checker, I prefer to use Copyscape Premium – a useful online plagiarism finder tool for detecting copies of our web content.

3. Stop Image Hotlinking from Scraper Sites

After you find the scraped content, it is mostly not possible to stop them from republishing. So we identify the site and then see how we stop image hotlinking by these sites. You can choose to display a copyright image of your choice to replace this.

This really works as we switch hotlinked images with a bright green notice to indicate the image belongs to our site and post our short url QOT.co which is easy to type should people decide to check the original source. This works even if they decided to remove copyright notices.

See how our image messes the site of those reusing our images.

4. Show Copyright policy on Your Blog

This hardly works as most RSS scrapers hardly ever have a contact page, publicly displayed email or comment forms in which you can get a response to requests to stop RSS scraping.

If the contact form is there, you can send them the link to your copyright policy, and some scrapers might decide to remove the auto publishing bots from republishing your site, fearing lawsuit threats.

5. Report Spam to Google

If these sites rank higher, Google wants to know. This is what takes these scrapers off search engines results to the bottom of the queue. Google has this special form to report web spam. You can fill this up and complain about these sites ranking higher than your site in search engine results.

Remember not to overdo the reporting and only report genuine spam. Of course you can file a DMCA request, and track your DMCA requests via the webmasters tools dashboard, but then that is a longer process.

report webspam

6. Stop the Site Income

Many such sites may carry Adsense ads, and if you report such Made-for-Adsense sites, Google will be happy to prevent paying advertisers from wasting their money.

Notice that all ads now carry an blue arrow marked with Adchoices, click that arrow and you would be directed to Google pages where you can reports problems with the site. Blocking the flow of free money makes the scraping futile with no income to justify the effort.

report website

7. Report to Web Hosting Provider

Web hosting companies take web spam very seriously and they will take prompt action to sites hosting or engaging in illegal activities. Find webhosting company of any website and then report using the site web hosting company contact forms. They will take care of the rest, if it is a genuine request.

Looking forward to how you deal with this in comments.

Share with friends

About the Author: P Chandra is editor of QOT, one of India's earliest tech bloggers since 2004. A tech enthusiast with expertise in coding, WordPress, web tools, SEO and DIY hacks.