How to extract text from an annotation using pdf.js or pdf-annotate.js?

Question

We have need to programmatically annotate a .pdf file and then extract the text from within that annotation. The use case would be to say highlight a few words yellow and then not only get the meta about the annotation (this is solved already), but also get the text highlighted within that annotation.

The requirement to create an annotation and get the meta information about it is not the problem. Using pdf.js one can use the getAnnotations() function that returns a promise filled with information about all of the annotations in the .pdf.

//The data doesn't contain the text information within the annotations using this method in pdf.js
var annotateMeta = page.getAnnotations().then(function (data) {
  console.log(data);
});

The problem is the object data has color and coordinate information but not any information regarding the text within that annotation.

Does anyone know how we can use either of these libraries (or really any other .js library) to get the text value within an annotation in a .pdf file?

See if fieldValue or contents have what you are looking for, see https://github.com/mozilla/pdf.js/blob/master/src/display/annotation_layer.js#L450 and https://github.com/mozilla/pdf.js/blob/master/src/display/annotation_layer.js#L244 . More details at https://github.com/mozilla/pdf.js/blob/master/src/core/annotation.js — async5, Aug 21 '17 at 13:08

How to extract text from an annotation using pdf.js or pdf-annotate.js?

0 Answers0