I am using the following code to get the whole textual content of any PDF file using PdfBox:
private static void textExtraction() throws FileNotFoundException, UnsupportedEncodingException, IOException
{
String encoding = null;
String outputFile = "path";
Writer output = new OutputStreamWriter(new FileOutputStream( outputFile ) );
PDFTextStripper stripper = new PDFTextStripper(encoding);
stripper.writeText( document, output );
}
this code works perfectly fine. but the question is how can I extract a text and know where it is? I mean, for example, I want to extract text page by page and it writes it into different files or for example I want it to look for a keyword and then extracting those parts that the keyword happens with telling me that where it happens etc.