If you are using the amazing Yoast SEO WordPress Plugin, how to noindex archive subpages from tags and categories so they can be blocked from the Google Index. You have probably realised that they no longer block the archive pages from the Google index as they removed the option to nofollow these pages since many years. But why?
Yoast SEO Plugin and NoIndex Options
Yoast SEO WordPress Plugin is a very popular WordPress plugin used by over 5 million websites. I got alarmed as after shifting from another plugin, my number of Google indexed pages, just zoomed and doubled for no obvious reason till I investigated further.
You can check the number of pages indexed in google by searching for site:domain.com. So as you see we got hundreds of Tags archive subpages indexed in Google. These are all pages with /page/
in URLs.
The explanation which they give on the support page is that Google has got intelligent over the years in recognizing the previous
and next
rel tags, which signify pagination, and thus, they understand the site architecture very well to send users to the first page. Further to quote …
Noindexing all these pages leads to a lower amount of crawls for them (source), which subsequently leads to lower amounts of crawls for older articles, which is not a good idea on most sites.”
There is emerging info that if you noindex a page, in the long-term then they might not follow links from that page also. Because that page is not indexed, so further crawls may decrease from that page. That seems fine. I personally feel this should be offered as an option which can be tweaked by the users.
How to Noindex Subpages
But I do not want Google to index thousands of pages from tags and categories as it can lead to duplicate content and crawl budget issues on large sites.
So here is the code which I added to functions.php file of my WordPress theme, to block all subpages from categories as well as tags as well as the index archive pages.
add_filter("wpseo_robots", function($robots) {
if (is_paged()) {
return 'noindex,follow';
} else {
return $robots;
}
});
What this code does is that it will show noindex meta tags on all paged content (is_page) from google. It will add a meta robots tag in the <HEAD> part of the HTML code like this.
<meta name="robots" content="noindex, follow>
All URLs with /page/
in the url, which occurs multiple times in index archives. categories archives and tag archives will show this code.
Option 2 – The following code will add noindex to subpages on categories and tags, while sparing subpages of index pages. See what suits you.
add_filter("wpseo_robots", function($robots) {
if (is_paged() && is_archive()) {
return 'noindex,follow';
} else {
return $robots;
}
});
Very important – Once you activate it, you can check the HTML source code of all such pages and see the code is displaying correctly and on pages you expected it to. Gradually as Googlebot recrawls you pages and reads these tags, Google will remove from the Index over the next few weeks.
Warning: Do this at your own risk as it will remove possibly thousands of indexed pages from the Google Index and may adversely affect your search engine rankings and site traffic. So consult a SEO expert before implementing this code.