I'm about to create a very very big project.
How do I create a search engine with the following features:-
- I give it a URL and it will get all the available links in that page
- It should read the robots.txt file to make sure what to index and what to not index
- I want it to get any pages add to any site in the database without recrawling it
- It reads the xml sitemaps
- How to work with keywords
and if possible, please : how do i structure my database?