I am developing a small app with C# and .NET to automate a process which is currently done manually. The app is looking for a particular pattern in a PDF document and uploads it wherever it needs to be according to the pattern. It works without any issues with PDFs, which have been written digitally(Word, Nodepad, etc...) and then converted to PDF.
I later found out that the documents which will be used in the future will be 90% scanned documents. This turned out to be an issue a lot larger than I expected. I found multiple third-party libraries which can handle this task -> iText7, LeadTools, ABBYY, WhatsMate PDF-to-text API, SautinSoft .NET Offce Edition. The issue is, they are all paid and I cannot afford any of them.
I got an idea to convert the PDF to any image type (jpg, png, tiff, etc.) and use Tesseract OCR to recognize the text. The issue is, I cannot find a free-to-use library to convert to image type.
I am asking for any advice on the topic. Is is possible to extract text from scanned PDF for free? Or, is it possible to convert the PDF to an image type and use OCR for free?
Thank you for your time and I apologize if I did not format my question the right way.