I'm automating my test scenario for validation of a pdf document. This document opens in a new browser tab once clicked on the document link(anchor tag). I want to validate a few important contents in a document for which I'm using Apache PDFBox. But, the document URL has a prefix 'blob' because of which, java.net.URL class is throwing MalformedException for unknown protocol: blob. how should I define/add that protocol in java?
Please let me know how to get rid of this error so that I can successfully use PDFBox to parse my pdf file.
Java version - 1.8
This is the screenshot of pdf document after it opens in a browser.
This is HTML source of document. But, as it's a pdf view, cannot perform any operations such as fetching text/windowTitle etc.
following is a sample code snippet -
public void readPdfContents() throws IOException {
String url = "blob:https://cpswebqa.testcbidata.com/f9ad63bc-700e-4f49-a4fb-807ad1a44b01";
URL pdfUrl = new URL(url);
InputStream ips = pdfUrl.openStream();
BufferedInputStream bis = new BufferedInputStream(ips);
PDFParser pdfParser = new PDFParser(bis);
pdfParser.parse();
String pdfData = new PDFTextStripper().getText(pdfParser.getPDDocument());
System.out.println("PDF Data is - " + pdfData);
}
Error stack trace -
Exception in thread "main" java.net.MalformedURLException: unknown protocol: blob
at java.net.URL.<init>(URL.java:600)
at java.net.URL.<init>(URL.java:490)
at java.net.URL.<init>(URL.java:439)
at com.cbsh.automation.file.testrunner.WEB.Sample.main(Sample.java:11)