-1

Following code is perfectly downloading the PDF. Now I want to convert this PDF content to Text file.Please help. I tried with a lot many codes by goggling but none of them worked.

import org.openqa.selenium.By;
import org.openqa.selenium.WebDriver;
import org.openqa.selenium.firefox.FirefoxDriver;
import org.openqa.selenium.firefox.FirefoxProfile;
import org.testng.annotations.AfterTest;
import org.testng.annotations.BeforeTest;
import org.testng.annotations.Test;

 @Test

 public class PDF_Download_without_popup {
 WebDriver driver;

 @BeforeTest
 public void StartBrowser() {

  //Create object of FirefoxProfile in built class to access Its properties.

  FirefoxProfile fprofile = new FirefoxProfile();

   //Set Location to store files after downloading.

  fprofile.setPreference("browser.download.dir", "c:\\WebDriverdownloads");

  fprofile.setPreference("browser.download.folderList", 2);

//Set Preference to not show file download confirmation dialogue using MIME types Of different file extension types.

  fprofile.setPreference("browser.helperApps.neverAsk.saveToDisk", 
    "application/vnd.openxmlformats-officedocument.spreadsheetml.sheet;"//MIME types Of MS Excel File.
    + "application/pdf;" //MIME types Of PDF File.
    + "application/vnd.openxmlformats-officedocument.wordprocessingml.document;" //MIME types Of MS doc File.
    + "text/plain;" //MIME types Of text File.
    + "text/csv"); //MIME types Of CSV File.
  fprofile.setPreference( "browser.download.manager.showWhenStarting", false );

  fprofile.setPreference( "pdfjs.disabled", true );

  //Pass fprofile parameter In webdriver to use preferences to download file.

  driver = new FirefoxDriver(fprofile);  

 }  

  public void OpenURL() throws InterruptedException{

     driver.get("http://www.bell.ca/");
     driver.manage().window().maximize();
     Thread.sleep(30000);
     driver.findElement(By.xpath(".//*[@id='demoLoginLinkJs']/span[1]")).click();
     driver.findElement(By.xpath(".//*[@id='USER']")).sendKeys("bell_56789");
     driver.findElement(By.xpath(".//*[@id='PASSWORD']")).sendKeys("sunday21");
     driver.findElement(By.xpath(".//*[@id='demoLoginJs']")).click();
     driver.findElement(By.xpath("//span[contains(text(),'View current bill')]")).click();

     Thread.sleep(5000);


     driver.findElement(By.xpath(".//*[@id='btnDownloadBill']")).click();
     String tmp= driver.getCurrentUrl().toString();
     System.out.println(tmp);
     Thread.sleep(50000);


 }

 @AfterTest
 public void CloseBrowser() {  
  driver.quit();   
 }
}
MERose
  • 4,048
  • 7
  • 53
  • 79
Geetanjali C
  • 21
  • 1
  • 8

2 Answers2

1

Try with the Apache PDFBox API.

Then add it to your project.

In your case you are downloading the PDF, but don't download it, give the URL in navigate.to() to open the PDF in the browser, e.g.: http://www.bell.ca/xyz.pdf. So, your code will be something like:

URL xyzUrl = new URL("http://www.bell.ca/xyz.pdf");

BufferedInputStream TestFile = new BufferedInputStream(xyzUrl.openStream());
PDDocument xyzPDF = PDDocument.loadNonSeq(TestFile, null);
String testText = new PDFTextStripper().getText(xyzPDF);
xyzPDF.close();

Now you have fetched all text from the PDF file and can write those texts into an external XLS or any relevant type file using a third party API like Apache POI or any other available APIs.

Necreaux
  • 9,451
  • 7
  • 26
  • 43
Raavan
  • 107
  • 1
  • 8
  • Hi Pritam,i tried before with code you have shared. The problem here is , in my application URL is PDF download source page: https://mybell.bell.ca/Mobility/Billing/CurrentMobilityBill?AcctNo=506540566207DA7D21817764E2831250CF419BC141B457EAE761A07E862087A1B6D26EF365972B6F – Geetanjali C May 28 '15 at 12:29
  • The pdf is opening as a native windows app/through browser plugin and selenium can't handle that If this is the case then, after downloading, the pdf file will no longer under a browser, and selenium can't automate it as pdf will be a native window application after download.:(. It may help http://stackoverflow.com/questions/6668141/interacting-with-a-pdf-popup-in-selenium – Raavan May 28 '15 at 12:45
  • I edited your answer (which is OK) so that it uses the current PDFBox API. – Tilman Hausherr May 28 '15 at 17:48
0

@Geetanjali, I can suggest another way around. There are several online website who provide pdf to text conversion service. There you just need to upload your file and click "convert", then your pdf will be converted to text.

So, my point is you can automate it also every time you download a pdf. After downloading the pdf, open one of those websites. Upload your file using third party tool like AutoIT API (add in your buildpath). and can download the text file after conversion.

Raavan
  • 107
  • 1
  • 8