iTextSharp - when extracting a page it fails to carry over Adobe rectangle highlighting important info

Question

Per the following site...

http://forums.asp.net/t/1630140.aspx?extracting+pdf+pages+using+itextsharp

...I use the function ExtractPages to produce a new PDF based on range of page numbers. My problem is that I noticed a PDF that had a rectangle on the 2nd page was not extracted along with the page. This causes me some fear that perhaps Adobe comments are not being carried over as well as the pages get extracted.

Is there a way I can adjust this code to take into consideration to bring over comments and objects like rectangles to the new PDF when ExtractPages is called? Am I missing a syntax or is that not available with version 5.5.0 of iTextSharp?

score 5 · Answer 1 · edited May 23 '17 at 12:28

5

Your use of the verb extract in the context of extracting pages is confusing. People will think you want to extract text from a page. In reality, you want to import or copy pages.
The example you refer to uses PdfWriter. That's wrong: you should use PdfStamper (if only one existing PDF is involved) or PdfCopy (if multiple existing PDFs are involved). See my answer to the question How to keep original rotate page in itextSharp (dll) to find out why the example on forums.asp.net is a really, really bad example.
The fact that a page has "a rectangle" is irrelevant. Maybe the rectangle is an annotation. In that case, you're throwing that rectangle away by using the wrong example. Maybe the origin of the page is different from 0,0...

If your purpose is to create a new PDF containing only a selection of pages of the original PDF, please read my answer to Function that I can use to remove a single page from a PDF using iText

edited May 23 '17 at 12:28

Community

1
1

answered Apr 24 '14 at 05:47

Bruno Lowagie

75,994
9
109
165

3

I don't want to sound frustrating, but I would really love to hear why so many people always copy bad examples instead of using the examples on the official iText site ([Java][1], [C#][2]) or from the [book][3]. We aren't hiding the documentation. What can we do to take away the perception that there's no documentation? What can we do to fight the abundance of bad examples? [1]: http://itextpdf.com/book/examples.php [2]: http://support.itextpdf.com/node/178 [3]: http://manning.com/lowagie2/samplechapter2.pdf – Bruno Lowagie Apr 24 '14 at 09:00
1

Bruno, I feel for you but the reality is that most people's first reaction to a problem is to search for code that's similar to their scenario. The date that the code was written and the comments surrounding the code aren't even taken into account, sometimes because of language barriers, sometimes because of laziness. Books are great but they're not "now!" for some people. On my to-do list is to self-answer some questions (http://stackoverflow.com/help/self-answer) and hopefully we (you, me, @mkl, etc) can upvote those so people find them better. – Chris Haas Apr 24 '14 at 14:48
1

"Your use of the verb extract in the context of extracting pages is confusing." Well that's exactly how Adobe Professional has their Tool section laid out. You open up a PDF, you click on the Tools tab - and if you want to import or copy pages - you have to click on their term Extract. Logically I did a Google search for extract pages and that function came up at the top. Thank you though anyway for the clarification. – user3566645 Apr 24 '14 at 15:12
"Bruno, I feel for you but the reality is that most people's first reaction to a problem is to search for code that's similar to their scenario." I'm doing a lot more than just extracting pages from a PDF and have deadlines. Why read a book when I can google the exact function I need that is working. Most of what is coming up at the top of google are blogs that handle specific needs. Google is not going to internally search documentation if its not worded exactly as what's expected. – user3566645 Apr 24 '14 at 15:19
@user3566645 "why read a book when someone posts a function that fits their needs" Oo [You must have reached Stackoverflow by mistake](http://stackoverflow.com/help/how-to-ask). In fact, you need to hire a professional developper. – RandomSeed Apr 24 '14 at 15:23
Bruno, I'd say if you want it fixed - I would reach out to those blog sites with the correct code (2 additional shown in that one link above) and have them correct it. These are the sites that come up when someone from an Adobe background - tries to search for Extract Pages. I think in the course of trying that maybe they did try PdfCopy and something was not working as expected. – user3566645 Apr 24 '14 at 15:23
1

"In fact, you need to hire a professional developer." I did that and they recommended running JavaScript code within the document itself. It cost a lot of money and then in the end didn't fully work. What currently I've developed on my end is working except not copying annotations using iTextSharp. All I have to do is now use ITextSharp PDFCopy. – user3566645 Apr 24 '14 at 15:31
Then on top of this all - obviously a lot of these people had bugs with the original PDFCopy. It amazes me why you would wonder at all?http://stackoverflow.com/questions/20656540/copy-pdf-form-with-pdfcopy-not-working-in-itextsharp-5-4-5-0 – user3566645 Apr 24 '14 at 15:40
`code`private static void ExtractPageswithAnnotation(string inputFile, string outputFile, int start, int end) { string sRange = start.ToString() + "-" + end.ToString(); PdfReader reader = new PdfReader(inputFile); reader.SelectPages(sRange); FileStream fs = new FileStream(outputFile, FileMode.Create); PdfStamper stamper = new PdfStamper(reader, fs); stamper.Close(); }`code` – user3566645 Apr 24 '14 at 16:54
That's indeed the code you'd have found if you had read the documentation or searched the answers on SO: http://stackoverflow.com/questions/23117119/function-that-i-can-use-to-remove-a-single-page-from-a-pdf-using-itext/23150424#23150424 – Bruno Lowagie Apr 24 '14 at 17:36
3

@user3566645, I apologize if we appeared to take our frustrations out on you, that wasn't intended. Some of us can quote the PDF spec and even though Adobe wrote it we consider them just another implementer of it, equally the same as Microsoft, Apple and Google. Time and time again we see the same code come up over and over that starts with an incorrect assumption and we're asked to fix the broken parts. Unfortunately many of the valid SO answers are to one-time users and they don't get answered or upvoted so no one knows that they are correct. – Chris Haas Apr 24 '14 at 18:38

iTextSharp - when extracting a page it fails to carry over Adobe rectangle highlighting important info

1 Answers1