We have noticed that the number of crawled results in the Contents tab is fewer than the actual number pages on the site. How can we crawl all of our content?
Here are some of the most common reasons Swiftype might not be able to crawl all of your site’s content:
-
The missing page content isn’t linked to from other known parts of the site, nor included in the domain’s sitemap.xml file.
-
There are path rules configured in the Domain settings page that restrict the crawler to specific section(s) of the site.
-
Your site has a robots.txt file that Disallow’s Swiftbot or all search agents from specific paths in the domain.
-
Pages on the site have robots meta tags set to
noindex
and/ornofollow
-
The site template uses canonical tags that are configured to point at a URL different from the one you expect to be indexed.
-
The missing content has been added to your site since the last full recrawl occurred. In this case requesting a recrawl from the Domains section of the dashboard should correct the issue.
Troubleshooting and addressing any of the above will help your Swiftype crawl be more successful, and it will also help with other search engines such as Google.