As far as I know, it is not possible to search for pdf content with out having 3rd party tool, software or utility installed. So there are pdfgrep for example. But if you manage to any way make a c# program, I would include a third party library to do the job.
I made a solution for some thing similar in this answer Read specific value based on label name from PDF in C#, with a bit of tweak you can have what you are looking for. The only thing is with PdfClown, it is for .net framework, but at the other hand it is open source, free and has no limitation. But if you are looking for .net core you might find some free (with limitation) or paid pdf libraries.
As you request in the comment here is a sample solution to find text in side pdf pages. I have left comments inside the code:
//The found content
private List<string> _contentList;
//Search for content in a given pdf file
public bool SearchPdf(FileInfo fileInfo, string word)
{
_contentList = new List<string>();
ExtractPages(fileInfo.FullName);
var content = string.Join(" ", _contentList);
return content.Contains(word);
}
//Extract content for each page of given pdf file
private void ExtractPages(string filePath)
{
using (var file = new File(filePath))
{
var document = file.Document;
foreach (var page in document.Pages)
{
Extract(new ContentScanner(page));
}
}
}
//Extract content of pdf page and put the found result inside _contentList
private void Extract(ContentScanner level)
{
if (level == null)
return;
while (level.MoveNext())
{
var content = level.Current;
switch (content)
{
case ShowText text:
{
var font = level.State.Font;
_contentList.Add(font.Decode(text.Text));
break;
}
case Text _:
case ContainerObject _:
Extract(level.ChildLevel);
break;
}
}
}
Now lets do quick test, so we assume all your invoice are in c:\temp folder:
static void Main(string[] args)
{
var program = new SearchPdfContent();
DirectoryInfo d = new DirectoryInfo(@"c:\temp");
FileInfo[] Files = d.GetFiles("*.pdf");
var word = "Sushi";
foreach (FileInfo file in Files)
{
var found = program.SearchPdf(file, word);
if (found)
{
Console.WriteLine($"{file.FullName} contains word {word}");
}
}
}
In my case I have for example word sushi inside the invoice:
c:\temp\invoice0001.pdf contains word Sushi
All that said, this is an example of solution. You can take it from here bring it to the next level. Enjoy your day.
I leave some links of what I have searched for: