I have written java code to extracting data from url pdf link using pdfbox api.i have successfully got whole data in text format.but the pdf file contains article related information like title,author name and embargo date and i want to extract that not whole text data.is there any way to get only selected data from pdf using pdfbox.
URL url = new URL("http://www.example.com");
connection = (HttpURLConnection) url.openConnection();
connection.setRequestProperty("Authorization", "Basic " + encodedString);
connection.connect();
input = connection.getInputStream();
FileOutputStream fos1 = new FileOutputStream("download.pdf");
(....perform writing operation )
File in = new File("download.pdf");
PDFParser parser = new PDFParser(new FileInputStream(in));
parser.parse();
COSDocument cosDoc = parser.getDocument();
pdfStripper = new PDFTextStripper();
PDDocument pdDoc = new PDDocument(cosDoc);
String parsedText = pdfStripper.getText(pdDoc);