0

Is there a way to get text that exist inside the border of specific color let say "red". is it possible to all the text that exist in side "red" border box from pdf using c#. i had googled it but i did not found anyway to get text with style format from pdf.

enter image description here

Muhammad Nasir
  • 2,126
  • 4
  • 35
  • 63
  • 2
    Possible duplicate of [Extracting text from PDFs in C#](http://stackoverflow.com/questions/2116440/extracting-text-from-pdfs-in-c-sharp) – tretom Mar 07 '17 at 15:50
  • Unfortunately, you can't parse a PDF like you do HTML. I think @Joe Irby has the best solution... find a third-party option. But it won't be easy. – Eric Burdo Mar 07 '17 at 16:15
  • The OP already tagged his question [tag:pdfbox] which is for a third-party library for PDF handling. I think he effectively asks how to implement his task using PDF Box. – mkl Mar 07 '17 at 16:16
  • Muhammad, how are those red border boxes drawn? There are numerous ways to do so in PDFs. Creating a solution for all those ways in a single answer is too broad for stack overflow. – mkl Mar 07 '17 at 16:21
  • Did you try ExtractTextByArea? – Tilman Hausherr Mar 07 '17 at 18:54

1 Answers1

0

The answer is not simple, unfortunately. Usually, when programmers need to write code that can parse text out of PDF files (what you are trying to do), they use third-party code libraries that other people wrote specifically for manipulating PDFs. In the C# world, there are a few options for well-known PDF manipulation libraries, but the ones that are easiest to use are not free. I've personally had good results using a library called iTextSharp, but it is not free.

Joe Irby
  • 649
  • 6
  • 7
  • The OP already tagged his question [tag:pdfbox] which is for a third-party library for PDF handling. I think he effectively asks how to implement his task using PDF Box, not how to do it without a library. – mkl Mar 07 '17 at 16:17