55

I've spent all day trying to get hyperlinks metadata from PDFs in my iPad application. The CGPDF* APIs are a true nightmare, and the only piece of information I've found on the net about all this is that I have to look for an "Annots" dictionary, but I just can't find it in my PDFs.

I even used the old Voyeur Xcode sample to inspect my test PDF file, but no trace of this "Annots" dictionary...

You know, this is a feature I see on every PDF reader - this same question has been asked multiple times here with no real practical answers. I usually never ask for sample code directly but apparently this time I really need it... anyone got this working, possibly with sample code?

Update: I just realized the guy who has done my testing PDF had just inserted an URL as text, and not a real annotation. He tried putting an annotation and my code works now... But that's not what I need, so it seems I'll have to analyze text and search for URLs. But that's another story...

Update 2: So I finally came up with some working code. I'm posting it here so hopefully it'll help someone. It assumes the PDF document actually contains annotations.

for(int i=0; i<pageCount; i++) {
    CGPDFPageRef page = CGPDFDocumentGetPage(doc, i+1);

    CGPDFDictionaryRef pageDictionary = CGPDFPageGetDictionary(page);

    CGPDFArrayRef outputArray;
    if(!CGPDFDictionaryGetArray(pageDictionary, "Annots", &outputArray)) {
        return;
    }

    int arrayCount = CGPDFArrayGetCount( outputArray );
    if(!arrayCount) {
        continue;
    }

    for( int j = 0; j < arrayCount; ++j ) {
        CGPDFObjectRef aDictObj;
        if(!CGPDFArrayGetObject(outputArray, j, &aDictObj)) {
            return;
        }

        CGPDFDictionaryRef annotDict;
        if(!CGPDFObjectGetValue(aDictObj, kCGPDFObjectTypeDictionary, &annotDict)) {
            return;
        }

        CGPDFDictionaryRef aDict;
        if(!CGPDFDictionaryGetDictionary(annotDict, "A", &aDict)) {
            return;
        }

        CGPDFStringRef uriStringRef;
        if(!CGPDFDictionaryGetString(aDict, "URI", &uriStringRef)) {
            return;
        }

        CGPDFArrayRef rectArray;
        if(!CGPDFDictionaryGetArray(annotDict, "Rect", &rectArray)) {
            return;
        }

        int arrayCount = CGPDFArrayGetCount( rectArray );
        CGPDFReal coords[4];
        for( int k = 0; k < arrayCount; ++k ) {
            CGPDFObjectRef rectObj;
            if(!CGPDFArrayGetObject(rectArray, k, &rectObj)) {
                return;
            }

            CGPDFReal coord;
            if(!CGPDFObjectGetValue(rectObj, kCGPDFObjectTypeReal, &coord)) {
                return;
            }

            coords[k] = coord;
        }               

        char *uriString = (char *)CGPDFStringGetBytePtr(uriStringRef);

        NSString *uri = [NSString stringWithCString:uriString encoding:NSUTF8StringEncoding];
        CGRect rect = CGRectMake(coords[0],coords[1],coords[2],coords[3]);

        CGPDFInteger pageRotate = 0;
        CGPDFDictionaryGetInteger( pageDictionary, "Rotate", &pageRotate ); 
        CGRect pageRect = CGRectIntegral( CGPDFPageGetBoxRect( page, kCGPDFMediaBox ));
        if( pageRotate == 90 || pageRotate == 270 ) {
            CGFloat temp = pageRect.size.width;
            pageRect.size.width = pageRect.size.height;
            pageRect.size.height = temp;
        }

        rect.size.width -= rect.origin.x;
        rect.size.height -= rect.origin.y;

        CGAffineTransform trans = CGAffineTransformIdentity;
        trans = CGAffineTransformTranslate(trans, 0, pageRect.size.height);
        trans = CGAffineTransformScale(trans, 1.0, -1.0);

        rect = CGRectApplyAffineTransform(rect, trans);

        // do whatever you need with the coordinates.
        // e.g. you could create a button and put it on top of your page
        // and use it to open the URL with UIApplication's openURL
    }
}
Community
  • 1
  • 1
ySgPjx
  • 10,165
  • 7
  • 61
  • 78
  • line 6, should that not be `continue` instead of `return`? - why do you return after checking object,value,dict,string,array etc. – Luke Mcneice Nov 19 '10 at 15:57
  • That's just example code without any error checking. – ySgPjx Nov 19 '10 at 16:11
  • PDF rects dont translate to native rects see my thread for details: scroll down to to: 'Other PDF Features','Getting Links inside a PDF', 'Understanding the PDF Rect for link positioning' http://stackoverflow.com/questions/3889634/fast-and-lean-pdf-viewer-for-iphone-ipad-ios-tips-and-hints – Luke Mcneice Nov 23 '10 at 11:35
  • I'm doing `rect.size.width -= rect.origin.x; rect.size.height -= rect.origin.y;` to fix that, it's working for me.. – ySgPjx Nov 23 '10 at 12:03
  • Yea that works for w&h but the pdf spec states: the array takes the form [llx lly urx ury] specifying the lower-left x, lower-left y, upper-right x, and upper-right y coordinates of the rectangle, in that order. This means that your `rect.origin.y` is actually `rect.origin.y+rect.size.height` as the adobe rect is the bottom left and not the top left defaulted by `CGRect`. It may not have been that noticable as it would probably only been 20-30 px out and still registered your press – Luke Mcneice Nov 23 '10 at 13:25
  • It's also worth mentioning that i couldn't get a URI from the annot, only a 'Dest' I assume this is default for internal document links? – Luke Mcneice Nov 23 '10 at 13:30
  • Yeah, IIRC "Dest" is for internal page links. – ySgPjx Nov 23 '10 at 13:37
  • See also http://stackoverflow.com/questions/3045587/how-to-get-actual-pdf-page-size-in-ipad/ to get the page size and convert the coordinates from PDF values to iOS values – Donal O'Danachair Jan 19 '11 at 16:53
  • @pt2ph8 Hi have you chance to get all links from document? – Matrosov Oleksandr Apr 18 '14 at 20:31

3 Answers3

15

heres the basic idea to get to the annots CGPDFDictionary for each page atleast. after that you should be able to figure it out with help from the PDF spec from Adobe.

1.) get the CGPDFDocumentRef.

2.) get each page.

3.) on each page, use CGPDFDictionaryGetArray(pageDictionary, "Annots", &outputArray) where pageDictionary is the CGPDFDictionary representing the CGPDFPage, and outputArray is the variable (CGPDFArrayRef) to store the Annots array of that page in.

Jesse Naugher
  • 9,780
  • 1
  • 41
  • 56
  • @Jesse Naugher: Thanks a lot for your answer, but: "after that you should be able to figure it out with help from the PDF spec from Adobe" I couldn't find any useful information from that bloated mess that is the Adobe's PDF spec. The only part of it where the word "annotation" appears is section 8, but again, I can't see any info that could help me here... *frustration* – ySgPjx Nov 02 '10 at 18:23
  • 1
    theres an entire section about every kind of annotation that can be in a pdf document, including the link annotation. Basically when you get the Annotations Array, you loop through it, and each entry is a dictionary that *is* an annotation. These dictionaries have a key called 'Subtype' that determines the type of annotation it is, and "Link" is one of them, and is defined in the pdf spec. – Jesse Naugher Nov 02 '10 at 18:27
  • @Jesse Naugher: Amazing, I just realized I was staring at the wrong document - now I have the **real** PDF spec document. I'll check it out now, thanks (yeah, that's what happens when you're tired/frustrated). – ySgPjx Nov 02 '10 at 18:29
  • @Jesse Naugher: `CGPDFDictionaryGetArray(pageDictionary, "Annots", &outputArray)` is returning false for me... Here's how I get pageDictionary: `CGPDFDictionaryRef pageDictionary = CGPDFPageGetDictionary(page);` – ySgPjx Nov 02 '10 at 18:36
  • make sure you are getting the pdf itself correctly, and also the page you have is valid, and has an annotation on it. you have to check each page for annotations separately – Jesse Naugher Nov 02 '10 at 18:42
  • @Jesse Naugher: Yeah, I'm doing that in a for loop. The CGPDFPageRef is valid, and the document is, too (I'm also drawing it so I'm pretty sure about it). Also, there are three links on the page I'm testing with... And the Preview app is reading them. Here's my method: http://pastebin.com/69JW1Kkc I set a breakpoint inside the CGPDFDictionaryGetArray if and it doesn't reach the CGPDFArrayGetCount call.. – ySgPjx Nov 02 '10 at 18:44
  • I also inspected the PDF with XCode's Voyeur sample (http://github.com/below/PDF-Voyeur/network) (which shows the three of nodes in a PDF) and there's no Annots array... but the links are there, I can click them in Preview... – ySgPjx Nov 02 '10 at 18:51
  • im not sure what to tell you, that should work, the only difference i have is i use a bool variable for the if check, but obviously that shouldn't make a difference. id try with a pdf you make in adobe or something, perhaps the creator doesn't correctly create annotations for links? im not sure. – Jesse Naugher Nov 02 '10 at 18:53
  • I updated my post, it was my PDF... Anyway it seems I'll have to parse text and search for URLs, I need it in my app... Thanks anyway for your answers. – ySgPjx Nov 02 '10 at 19:08
  • Good luck, your going to need it :p – Jesse Naugher Nov 02 '10 at 19:10
  • @Jesse Naugher: Now that I'm parsing annotations successfully, I need to display them. The only problem is that PDF obviously uses a different coordinate system. It seems that they are upside down or something. Any idea on how to fix this? – ySgPjx Nov 03 '10 at 15:07
  • I figured it out. First the Rect in the PDF is not in X,Y,W,H format, but it's an array of the four points that make up the rectangle, so: `CGRect rect = CGRectMake(coords[0],coords[1],coords[2]-coords[0],coords[3]-coords[1])`. – ySgPjx Nov 03 '10 at 16:03
  • Then the rect needs to be transformed in the same way the PDF itself is transformed when drawing (normally it would be upside down since Quartz uses a different coordinate system). So the code: `CGAffineTransform trans = CGAffineTransformIdentity; trans = CGAffineTransformTranslate(trans, 0, pageRect.size.height); trans = CGAffineTransformScale(trans, 1.0, -1.0); rect = CGRectApplyAffineTransform(rect, trans);` – ySgPjx Nov 03 '10 at 16:04
9

Great code but I am having a little trouble working it into my project. It gets all the URL's correctly but when I click on it nothing happens. Here is my code I had to modify yours slightly to work with my project). Is there something missing:

- (void) renderPageAtIndex:(NSUInteger)index inContext:(CGContextRef)ctx {
//CGPDFPageRef page = CGPDFDocumentGetPage(pdf, index+1);

CGPDFPageRef page = CGPDFDocumentGetPage(pdf, index+1);
CGAffineTransform transform1 = aspectFit(CGPDFPageGetBoxRect(page, kCGPDFMediaBox),
                                         CGContextGetClipBoundingBox(ctx));
CGContextConcatCTM(ctx, transform1);
CGContextDrawPDFPage(ctx, page);

int pageCount = CGPDFDocumentGetNumberOfPages(pdf);
int i = 0;
while (i<pageCount) {
    i++;
    CGPDFPageRef page = CGPDFDocumentGetPage(pdf, i+1);

    CGPDFDictionaryRef pageDictionary = CGPDFPageGetDictionary(page);

    CGPDFArrayRef outputArray;
    if(!CGPDFDictionaryGetArray(pageDictionary, "Annots", &outputArray)) {
        return;
    }

    int arrayCount = CGPDFArrayGetCount( outputArray );
    if(!arrayCount) {
        continue;
    }

    for( int j = 0; j < arrayCount; ++j ) {
        CGPDFObjectRef aDictObj;
        if(!CGPDFArrayGetObject(outputArray, j, &aDictObj)) {
            return;
        }

        CGPDFDictionaryRef annotDict;
        if(!CGPDFObjectGetValue(aDictObj, kCGPDFObjectTypeDictionary, &annotDict)) {
            return;
        }

        CGPDFDictionaryRef aDict;
        if(!CGPDFDictionaryGetDictionary(annotDict, "A", &aDict)) {
            return;
        }

        CGPDFStringRef uriStringRef;
        if(!CGPDFDictionaryGetString(aDict, "URI", &uriStringRef)) {
            return;
        }

        CGPDFArrayRef rectArray;
        if(!CGPDFDictionaryGetArray(annotDict, "Rect", &rectArray)) {
            return;
        }

        int arrayCount = CGPDFArrayGetCount( rectArray );
        CGPDFReal coords[4];
        for( int k = 0; k < arrayCount; ++k ) {
            CGPDFObjectRef rectObj;
            if(!CGPDFArrayGetObject(rectArray, k, &rectObj)) {
                return;
            }

            CGPDFReal coord;
            if(!CGPDFObjectGetValue(rectObj, kCGPDFObjectTypeReal, &coord)) {
                return;
            }

            coords[k] = coord;
        }               

        char *uriString = (char *)CGPDFStringGetBytePtr(uriStringRef);

        NSString *uri = [NSString stringWithCString:uriString encoding:NSUTF8StringEncoding];
        CGRect rect = CGRectMake(coords[0],coords[1],coords[2],coords[3]);

        CGPDFInteger pageRotate = 0;
        CGPDFDictionaryGetInteger( pageDictionary, "Rotate", &pageRotate ); 
        CGRect pageRect = CGRectIntegral( CGPDFPageGetBoxRect( page, kCGPDFMediaBox ));
        if( pageRotate == 90 || pageRotate == 270 ) {
            CGFloat temp = pageRect.size.width;
            pageRect.size.width = pageRect.size.height;
            pageRect.size.height = temp;
        }

        rect.size.width -= rect.origin.x;
        rect.size.height -= rect.origin.y;

        CGAffineTransform trans = CGAffineTransformIdentity;
        trans = CGAffineTransformTranslate(trans, 0, pageRect.size.height);
        trans = CGAffineTransformScale(trans, 1.0, -1.0);

        rect = CGRectApplyAffineTransform(rect, trans);

        // do whatever you need with the coordinates.
        // e.g. you could create a button and put it on top of your page
        // and use it to open the URL with UIApplication's openURL
        NSURL *url = [NSURL URLWithString:uri];
        NSLog(@"URL: %@", url);
        CGPDFContextSetURLForRect(ctx, (CFURLRef)url, rect);
       // CFRelease(url);
        }
    }   


}

Thanks & great work BrainFeeder!

UPDATE:

For anybody using the leaves project in your app this is how I got the PDF links to work (it's not perfect as the rect seems to fill the entire screen but it's a start):

- (void) renderPageAtIndex:(NSUInteger)index inContext:(CGContextRef)ctx {

CGPDFPageRef page = CGPDFDocumentGetPage(pdf, index+1);
CGAffineTransform transform1 = aspectFit(CGPDFPageGetBoxRect(page, kCGPDFMediaBox),
                                         CGContextGetClipBoundingBox(ctx));
CGContextConcatCTM(ctx, transform1);
CGContextDrawPDFPage(ctx, page);


    CGPDFPageRef pageAd = CGPDFDocumentGetPage(pdf, index);

    CGPDFDictionaryRef pageDictionary = CGPDFPageGetDictionary(pageAd);

    CGPDFArrayRef outputArray;
    if(!CGPDFDictionaryGetArray(pageDictionary, "Annots", &outputArray)) {
        return;
    }

    int arrayCount = CGPDFArrayGetCount( outputArray );
    if(!arrayCount) {
        //continue;
    }

    for( int j = 0; j < arrayCount; ++j ) {
        CGPDFObjectRef aDictObj;
        if(!CGPDFArrayGetObject(outputArray, j, &aDictObj)) {
            return;
        }

        CGPDFDictionaryRef annotDict;
        if(!CGPDFObjectGetValue(aDictObj, kCGPDFObjectTypeDictionary, &annotDict)) {
            return;
        }

        CGPDFDictionaryRef aDict;
        if(!CGPDFDictionaryGetDictionary(annotDict, "A", &aDict)) {
            return;
        }

        CGPDFStringRef uriStringRef;
        if(!CGPDFDictionaryGetString(aDict, "URI", &uriStringRef)) {
            return;
        }

        CGPDFArrayRef rectArray;
        if(!CGPDFDictionaryGetArray(annotDict, "Rect", &rectArray)) {
            return;
        }

        int arrayCount = CGPDFArrayGetCount( rectArray );
        CGPDFReal coords[4];
        for( int k = 0; k < arrayCount; ++k ) {
            CGPDFObjectRef rectObj;
            if(!CGPDFArrayGetObject(rectArray, k, &rectObj)) {
                return;
            }

            CGPDFReal coord;
            if(!CGPDFObjectGetValue(rectObj, kCGPDFObjectTypeReal, &coord)) {
                return;
            }

            coords[k] = coord;
        }               

        char *uriString = (char *)CGPDFStringGetBytePtr(uriStringRef);

        NSString *uri = [NSString stringWithCString:uriString encoding:NSUTF8StringEncoding];
        CGRect rect = CGRectMake(coords[0],coords[1],coords[2],coords[3]);

        CGPDFInteger pageRotate = 0;
        CGPDFDictionaryGetInteger( pageDictionary, "Rotate", &pageRotate ); 
        CGRect pageRect = CGRectIntegral( CGPDFPageGetBoxRect( page, kCGPDFMediaBox ));
        if( pageRotate == 90 || pageRotate == 270 ) {
            CGFloat temp = pageRect.size.width;
            pageRect.size.width = pageRect.size.height;
            pageRect.size.height = temp;
        }

        rect.size.width -= rect.origin.x;
        rect.size.height -= rect.origin.y;

        CGAffineTransform trans = CGAffineTransformIdentity;
        trans = CGAffineTransformTranslate(trans, 0, pageRect.size.height);
        trans = CGAffineTransformScale(trans, 1.0, -1.0);

        rect = CGRectApplyAffineTransform(rect, trans);

            // do whatever you need with the coordinates.
            // e.g. you could create a button and put it on top of your page
            // and use it to open the URL with UIApplication's openURL
            NSURL *url = [NSURL URLWithString:uri];
            NSLog(@"URL: %@", url);
//          CGPDFContextSetURLForRect(ctx, (CFURLRef)url, rect);
            UIButton *button = [[UIButton alloc] initWithFrame:rect];
            [button setTitle:@"LINK" forState:UIControlStateNormal];
            [button addTarget:self action:@selector(openLink:) forControlEvents:UIControlEventTouchUpInside];
            [self.view addSubview:button];
           // CFRelease(url);
        }
    //} 

Final Update Below is the final code I used in my apps.

- (void) renderPageAtIndex:(NSUInteger)index inContext:(CGContextRef)ctx {
//If the view already contains a button control remove it
if ([[self.view subviews] containsObject:button]) {
    [button removeFromSuperview];
}

CGPDFPageRef page = CGPDFDocumentGetPage(pdf, index+1);
CGAffineTransform transform1 = aspectFit(CGPDFPageGetBoxRect(page, kCGPDFMediaBox),
                                         CGContextGetClipBoundingBox(ctx));
CGContextConcatCTM(ctx, transform1);
CGContextDrawPDFPage(ctx, page);


CGPDFPageRef pageAd = CGPDFDocumentGetPage(pdf, index);

CGPDFDictionaryRef pageDictionary = CGPDFPageGetDictionary(pageAd);

CGPDFArrayRef outputArray;
if(!CGPDFDictionaryGetArray(pageDictionary, "Annots", &outputArray)) {
    return;
}

int arrayCount = CGPDFArrayGetCount( outputArray );
if(!arrayCount) {
    //continue;
}

for( int j = 0; j < arrayCount; ++j ) {
    CGPDFObjectRef aDictObj;
    if(!CGPDFArrayGetObject(outputArray, j, &aDictObj)) {
        return;
    }

    CGPDFDictionaryRef annotDict;
    if(!CGPDFObjectGetValue(aDictObj, kCGPDFObjectTypeDictionary, &annotDict)) {
        return;
    }

    CGPDFDictionaryRef aDict;
    if(!CGPDFDictionaryGetDictionary(annotDict, "A", &aDict)) {
        return;
    }

    CGPDFStringRef uriStringRef;
    if(!CGPDFDictionaryGetString(aDict, "URI", &uriStringRef)) {
        return;
    }

    CGPDFArrayRef rectArray;
    if(!CGPDFDictionaryGetArray(annotDict, "Rect", &rectArray)) {
        return;
    }

    int arrayCount = CGPDFArrayGetCount( rectArray );
    CGPDFReal coords[4];
    for( int k = 0; k < arrayCount; ++k ) {
        CGPDFObjectRef rectObj;
        if(!CGPDFArrayGetObject(rectArray, k, &rectObj)) {
            return;
        }

        CGPDFReal coord;
        if(!CGPDFObjectGetValue(rectObj, kCGPDFObjectTypeReal, &coord)) {
            return;
        }

        coords[k] = coord;
    }               

    char *uriString = (char *)CGPDFStringGetBytePtr(uriStringRef);

    NSString *uri = [NSString stringWithCString:uriString encoding:NSUTF8StringEncoding];
    CGRect rect = CGRectMake(coords[0],coords[1],coords[2],coords[3]);
    CGPDFInteger pageRotate = 0;
    CGPDFDictionaryGetInteger( pageDictionary, "Rotate", &pageRotate ); 
    CGRect pageRect = CGRectIntegral( CGPDFPageGetBoxRect( page, kCGPDFMediaBox ));
    if( pageRotate == 90 || pageRotate == 270 ) {
        CGFloat temp = pageRect.size.width;
        pageRect.size.width = pageRect.size.height;
        pageRect.size.height = temp;
    }

    rect.size.width -= rect.origin.x;
    rect.size.height -= rect.origin.y;

    CGAffineTransform trans = CGAffineTransformIdentity;
    trans = CGAffineTransformTranslate(trans, 35, pageRect.size.height+150);
    trans = CGAffineTransformScale(trans, 1.15, -1.15);

    rect = CGRectApplyAffineTransform(rect, trans);

    urlLink = [NSURL URLWithString:uri];
    [urlLink retain];

    //Create a button to get link actions
    button = [[UIButton alloc] initWithFrame:rect];
    [button setBackgroundImage:[UIImage imageNamed:@"link_bg.png"] forState:UIControlStateHighlighted];
    [button addTarget:self action:@selector(openLink:) forControlEvents:UIControlEventTouchUpInside];
    [self.view addSubview:button];
}   
[leavesView reloadData];
}

}
  • @user470763: Yeah, adding a button is the most obvious solution :) – ySgPjx Nov 04 '10 at 10:04
  • @Brainfeeder The only problems I am really having now is that rect size only scales for iPhone not iPad. Also, on full page links I can't swipe to change page. –  Nov 04 '10 at 14:05
  • @kmcg : Thank you for your code,i am able to scale rect sizes also in ipad,the only thing you need is to change the values of x and y,may be it may help you. Also wanted to ask whether u r able to find any word from the pdf file other than URLs.Thanks. – Yama Jan 20 '11 at 04:48
  • Beware that the button created by that piece of code is clear with white font. So if your pdf is not colored, then you won't see it. I'm not being able to put the rect on the right place though – Pacu Feb 03 '11 at 09:05
  • @kmcg Does this work for internal links too? Do you have any example project? Thanks in advance. – Jimit Mar 22 '11 at 22:21
  • @Jimit What do you mean by internal links? It works for links within the PDF (e.g. advertisements, URL's etc.) These links have to be built into the PDF when it was created though. It doesn't just scan for links written as text. I don't have an example project, sorry. –  Mar 28 '11 at 20:52
  • @kmcgrady Thanks, Appreciate your reply, I figured it out! – Jimit Mar 29 '11 at 14:29
  • @kmcgrady Did you ever figure out how to translate the rect correctly? – lindon fox Jun 18 '11 at 02:03
  • 1
    @lindon I have updated my answer with my final code. I'm 90% sure this worked on both iPhone and iPad but I don't have the time to test just now. I haven't worked on the project for about 6 months so I can't remember. Hopefully it helps you though. When I finished everything was working. –  Jun 26 '11 at 09:52
0

I must be confused, because this all works if I use:

CGRect rect = CGRectMake(coords[0],coords[1],coords[2]-coords[0]+1,coords[3]-coords[1]+1);

Am I misusing something later, perhaps? PDF supplies the corners, and CGRect wants a corner and a size.

Bill Cheswick
  • 634
  • 7
  • 12