I want to teach an AI to extract specific phrases from PDFs. For example the product name is somewhere described in the document and the AI has to find and extract it. My question is, if it's better to feed the PDFs as images or as an extracted String, as the documents are structured roughly. I hope my question understandable.
Maybe someone has some ideas or keywords for me to begin with too:)
EDIT: Thanks to the hint from lsimmons, I found a way to begin with: https://appliedmachinelearning.blog/2019/04/01/training-deep-learning-based-named-entity-recognition-from-scratch-disease-extraction-hackathon/
I will try this code, just with product names instead of diseases of course. This is called "Named Entity Recognition", for everyone having the same problem. I hope this works.