What methods are used to crawl a website that only offers a search bar for navigation

Question

How would you go about crawling a website such that you can index every page when there is only really a search bar for navagation like the following sites.

https://plejehjemsoversigten.dk/

https://findadentist.ada.org/

Do people just brute force the search queries, or is there a method that's usually implemented to index these kinds of websites?

score 3 · Accepted Answer · answered Sep 13 '21 at 07:09

There could be several ways to approach your issue (however if the owner of a resource does not want the resource to be crawled, that might be really challenging)

Check robots.txt of a resource. It might give you a clue on the site structure.
Check sitemap.xml of a resource. It might give URLs of the pages a resource owner wishes to be public
Use alternative indexers (like google). Use advanced syntax narrowing the scope of search to a particular site (like site:your.domain)
Use breaches in site design. For example first site from your list does not have a minimal search string so that you can search for, say, a and get 800 results containing a. Then list remaining letters.
Having search result crawl all the links on the search result items pages since there often might be related pages listed.

What methods are used to crawl a website that only offers a search bar for navigation

1 Answers1