Can the crawler "go through" blacklisted pages?

valentin.crettaz · February 1, 2018, 5:27pm

I wanted to know whether the crawler would still pass through a blacklisted page, even though the blacklisted page itself is not indexed.

The reason behind this is when you have some kind of summary page that lists other pages. The summary page itself should not be indexed, but the linked pages should. So if I blacklist the summary page will that prevent the crawler to reach the linked pages?

Summary page (blacklisted, should not be indexed)

linked page 1 (should be indexed)
linked page 2 (should be indexed)
linked page 3 (should be indexed)

mike · February 2, 2018, 6:03pm

Hi Valentin,

If a path is blacklisted it will be deemed invalid for crawling and the search agent will refrain from accessing it all together.

Based on your description, it sounds as though you might want to make use of robots meta tags instead. This will enable you to tell the crawler to follow links on a page but not index it. There’s more information here: Meta Tags | Swiftype Documentation

Another possible solution is to ensure the content you do want to be indexed can be discovered via sitemap: Sitemap.xml Support | Swiftype Documentation

valentin.crettaz · February 3, 2018, 10:06am

Hi Mike,
Thanks a lot for your answer, that makes perfect sense!

Topic		Replies	Views
How do I exclude parts of my site from being indexed? Setup & Indexing crawler	0	7730	April 11, 2016
Why does Swiftype crawl/index fewer pages than are on my site? Setup & Indexing	0	7550	December 21, 2015
Swiftype Crawl indexes content thats hidden	2	3444	February 21, 2018
What are canonical URLs and how do they affect Swiftype? Setup & Indexing crawler , meta-tags	0	5704	November 2, 2016
Does Swiftype respect robots.txt files? Setup & Indexing	0	3804	December 21, 2015

Can the crawler "go through" blacklisted pages?

Related topics