I am needing to add a a good lunr search to my MD (Markdown) documents across my site. I am working with React/ NextJS site and we have a lot of Markdown docs in both blog and regular "docs" areas. We need a search for those documents, so website users can search for topics. The MDs are rendered in realtime via NextJS and all is well, but we need to add a server side search. For security reasons we have to limit the user UI to only receiving results, not getting pages to search only client/browser side. We also need to have the search happen server side and never reach out beyond our intranet because it is a corporate fire-walled situation and the website can't use links on the external internet. In any case, here is the scenario:
- build lunr search index at site build time (it's all static so this isn't tough)
- use API endpoint routing to do the server side search on the search index.
- use nextJS clientside as the user interaction point that hits the API endpoint for search and results.
I know how to use lunr, build an index, build the API router endpoint and render to the screen/page. But what I need to do is understand how to build the index and search the rendered version of the MD documents (I guess that would be displayed HTML). How do I feed the lunr index building with content that either a) doesn't have MD non-searchable meta information, or b) HTML non-searchable HTML tags. You see the issue? I need to build the index against the real text that the user would be wanting to search, not the language tags in Markdown or HTML. Is there already an NPM package that does this kind of thing?
I am also wondering how to breakdown the searchable index so that the search results look nice with a little preview including the sentence of each of the matches, instead of just the page name..