I'm working on an app which is creating a data bank of questions from old question papers. I wanted to maintain a table linking similar questions together as they were inserted. (The table I had in mind was a Modified Preordered Traversal Tree).
The requirements I have are:
- Word problems with changed numbers should be linked together
- Word problems with proper nouns/names being different should be linked together.
- XYZ, ABC, PQR, MNO are equivalent (eg. triangle nomenclatures)
- Ignore punctuations and conjunctions and 'small words'.
- Tags! I'm tagging each question with its subject. The likelyhood of a Math question being similar to a History question is rare. But a Chemistry thermodynamics question could be similar to a Physics thermodynamics question.
Any idea on how to proceed on the algorithm side of things would be very much appreciated.
Also I'll be dealing with images containing Math notation. Should I make sure all my images have LaTeX in the 'ALT' attribute to make sure they are too processable by this algorithm or is there a better way of doing it?