How to index private pay-wall content


#1

Hello,

I’m wondering if the API supports private non-public content?

Our site has a pay-wall, so 99% of the content, thounsands of pages, are not available to the public/search crawlers. Is it possible to add this sort of content to the web crawler without exposing pay-wall content publically? Said another way, is it possible to provide non-public content/index to the crawler via a private

We also have thounsands of private PDFs, I have seen you do have PDF crawling, but again, will this work with non-public PDFs? I can see it being possible to create a PDF search cache index and push it as per point #3 above to Swiftype privately via an API?

Thanks.


#2

Hey Maurice,

Yes, it does!

In order to index private content using our Crawler, you’ll need to allow our search agent access to the paywalled content by whitelisting our search agent IP block, or your Swiftype account specific User Agent ID.

To best prevent public exposure of paywalled content in search results, you could index the paid content to its own specific engine, limit public facing search requests access to the public search engine, and then customize the SERP experience for logged-in members by querying both public & premium content engines.

For searching paywalled and/or private content, we recommend making the search requests server-side as opposed to client-side, protecting your engine’s authentication parameters as well as not publicly exposing the request calls.


Indexing of private PDFs can be similarly accomplished with the above. If you are unable to whitelist our Crawler, you would need to index content using our Developer API instead.

Hope that helps!