I am using iText (for .net) to read pdf files. It reads the document but when there are whitespaces it reads only one space.
That makes it impossible to extract data by getting substrings. I want to read data line by line with whitespaces so I know the actual position of text because I want to write the data into a database.
The file is a bank statement, I want to dump it into a database for designing a reconciled system,
Here is a screen shot of a file
Following is the code which I am using
For page As Integer = 1 To pdfReader.NumberOfPages
' Dim strategy As ITextExtractionStrategy = New SimpleTextExtractionStrategy()
Dim Strategy As ITextExtractionStrategy = New iTextSharp.text.pdf.parser.LocationTextExtractionStrategy()
Dim currentText As String = PdfTextExtractor.GetTextFromPage(pdfReader, page, strategy)
currentText = Encoding.UTF8.GetString(ASCIIEncoding.Convert(Encoding.[Default], Encoding.UTF8, Encoding.[Default].GetBytes(currentText)))
Dim delimiterChars As Char() = {ControlChars.Lf}
Dim lines As String() = currentText.Split(delimiterChars)
Dim Bnk_Name As Boolean = True
Dim Br_Name As Boolean = False
Dim Name_acc As Boolean = False
Dim statment As Boolean = False
Dim Curr As Boolean = False
Dim Open As Boolean = False
Dim BankName = ""
Dim Branch = ""
Dim AccountNo = ""
Dim CompName = ""
Dim Currency = ""
Dim Statement_from = ""
Dim Statement_to = ""
Dim Opening_Balance = ""
Dim Closing_Balance = ""
Dim Narration As String = ""
For Each line As String In lines
line.Trim()
'BANK NAME
If Bnk_Name Then
If line.Trim() <> "" Then
BankName = line.Substring(0, 21)
Bnk_Name = False
Else
Bnk_Name = False
End If
End If
but I want as it is as whitespaces to read position