I have got code that is meant to extract text from a user created rectangle on a PDF.
I am using ITextSharp for this.
The user inputs the co-ordinates of where they want the rectangle to be and they can 'preview' the rectangle, which draws a red rectangle over their pdf, or 'generate' a new pdf, which is meant to capture text within that rectangle and add an extra page to the pdf with just this text.
My issue is, the text is being captured from an area completely seperate to the preview rectangle. Both rectangles are created in the same way:
//Preview rectangle code
var xfer = ConvertToPoint(Convert.ToDouble(ULTB.Text));
var yfer = ConvertToPoint(Convert.ToDouble(LLTB.Text));
var uxfer = ConvertToPoint(Convert.ToDouble(URTB.Text));
var uyfer = ConvertToPoint(Convert.ToDouble(LRTB.Text));
iTextSharp.text.Rectangle rect = new iTextSharp.text.Rectangle((float)xfer, (float)yfer, (float)uxfer, (float)uyfer);
This rectangle is then drawn onto the user document.
(ConvertToPoint just converts the user input into a point rather than mm)
Using the exact same user input, the rectangle created by the following code is in a different location:
var xfer = ConvertToPoint(Convert.ToDouble(ULTB.Text));
var yfer = ConvertToPoint(Convert.ToDouble(LLTB.Text));
var uxfer = ConvertToPoint(Convert.ToDouble(URTB.Text));
var uyfer = ConvertToPoint(Convert.ToDouble(LRTB.Text));
RenderFilter[] filters = new RenderFilter[1];
LocationTextExtractionStrategy regionFilter = new LocationTextExtractionStrategy();
filters[0] = new RegionTextRenderFilter(new iTextSharp.text.Rectangle((float)xfer, (float)yfer, (float)uxfer, (float)uyfer));
FilteredTextRenderListener strategy = new FilteredTextRenderListener(regionFilter, filters);
String result = PdfTextExtractor.GetTextFromPage(reader, x, strategy);
The above code should get the text from the position from the users coordinates but is not, any ideas?
I've attached the PDF onto Google drive, with a lot of text redacted
The red rectangle is what i get via the preview code and the text towards the bottom of the document is whats being picked up by the text capture