0

I was trying to read PDF contents which is opened in new tab using Protractor, but could not find any relevant npm library that can provide openConnection() method unlike Java,

Below is the code for Java that works in selenium. Any help if anyone knows similar methods that we can use to read PDF contents in protractor please?

public static String readPDFContent(String pdfURL) throws Exception {
    URL url = new URL(pdfURL);
    URLConnection conn = url.openConnection();
    conn.connect();
    InputStream input = conn.getInputStream();
    BufferedInputStream fileToParse = new BufferedInputStream(input);
    PDDocument document = null;
    String output = null;
    try {
        document = PDDocument.load(fileToParse);
        output = new PDFTextStripper().getText(document);
        System.out.println(output);

    } finally {
        if (document != null) {
            document.close();
        }
        fileToParse.close();
        input.close();
    }
    return output;}

May be, if anyone knows how to open an URLconnections using JavaScript or TypeScript or how to use Java classes in scripts, that would also help.


I managed to write below code but it is not working, as the PDF file is actually an attachment to the response. And I found similar thread NodeJS Read file attachment from HTTP response which is unanswered.

static readPDFContent(URLValue: string, cookie: string) { let header = { 'Content-Type': 'application/pdf', 'Accept': '/', 'Connection': 'keep-alive', 'Cookie': cookie }; let options = { url: URLValue, method: 'GET', headers: header };

    let request = http.request(options, function (response) {
        console.log("Request is => " + request);
        let output = '';
        response.on('data', function (chunk) {
            output += chunk;
        });
        response.on('end', function () {
            console.log("Output is => " +output);
        });
    });
    request.on('error', function (err) {
        console.log(err.message);
    });
    request.end();
Vishal R
  • 189
  • 1
  • 6
  • maybe https://www.npmjs.com/package/pdf-parse – Sergey Pleshakov Apr 24 '20 at 15:49
  • Thank you @SergeyPleshakov for the response but this package wont work as for pdf-parse we need to provide actual PDF file to parse it into json or text file however, in my case I have url (without .pdf extension) which is loading PDF file online in new tab of browser. Hence either pdf-parse or pdf2json packages are not working for me as the methods "pdfParser.loadPDF(URLValue);" are unable to understand it as PDF url and throwing an error. – Vishal R Apr 24 '20 at 15:59
  • unfortunately I can't give an answer because I never worked with pdf. But if I were you, I would try to download that pdf by provided url, and then work with it, otherwise it sounds more like browser plugin work, which is another layer of complexity if possible at all – Sergey Pleshakov Apr 24 '20 at 18:08
  • Check this out: https://stackoverflow.com/a/50260022/134120 – AsGoodAsItGets May 06 '20 at 15:20
  • Hi @AsGoodAsItGets, thank you for this, but I'm looking for https://stackoverflow.com/questions/29985509/nodejs-read-file-attachment-from-http-response this one which is unanswered. – Vishal R May 11 '20 at 11:48

0 Answers0