I am working on a project to build a question-answering system for a documentation portal containing over 1,000 Markdown documents, with each document consisting of approximately 2,000-4,000 tokens.
I am considering the following two options:
- Using indexes and embeddings with GPT-4
- Retraining a model like GPT4ALL (or a similar model) to specifically handle my dataset
Which of these approaches is more likely to produce better results for my use case?