I have an asp.net Core 2.0 C#
application which read/parse the PDF file and get the text. In this I want to read specific value which have specific label name. You can see the below image I want to get the value 171857
which is Invoice
number and store it in database.
I have tried below code to read the pdf using iTextSharp
.
using (PdfReader reader = new PdfReader(fileName))
{
StringBuilder sb = new StringBuilder();
ITextExtractionStrategy strategy = new SimpleTextExtractionStrategy();
for (int page = 0; page < reader.NumberOfPages; page++)
{
string text = PdfTextExtractor.GetTextFromPage(reader, page + 1, strategy);
if (!string.IsNullOrWhiteSpace(text))
{
sb.Append(Encoding.UTF8.GetString(ASCIIEncoding.Convert(Encoding.Default, Encoding.UTF8, Encoding.Default.GetBytes(text))));
}
}
var pdfText = sb.ToString();
}
In pdfText
variable I will get all text content from pdf but It seems that this is not the proper way to get the Invoice number. Is there any other way to read the specific content from pdf by it's label name like we will provide label name Invoice
and it will return the value 171857
as example with other 3rd party pdf reader libraries?
Any help or suggestions would be highly appreciated.
Thanks