2

I am scraping the web page and navigating to correct location, however as being a new to the whole c# world I am stuck with downloading pdf file.

Link is hiding behind this

var reportDownloadButton = driver.FindElementById("company_report_link");

It is something like: www.link.com/key/489498-654gjgh6-6g5h4jh/link.pdf

How to download the file to C:\temp\?

Here is my code:

using System.Linq;
using OpenQA.Selenium.Chrome;

namespace WebDriverTest
{
    class Program
    {
        static void Main(string[] args)
        {

            var chromeOptions = new ChromeOptions();
            chromeOptions.AddArguments("headless");

            // Initialize the Chrome Driver // chromeOptions
            using (var driver = new ChromeDriver(chromeOptions))
            {
                // Go to the home page
                driver.Navigate().GoToUrl("www.link.com");
                driver.Manage().Timeouts().ImplicitWait = System.TimeSpan.FromSeconds(15);
                // Get the page elements
                var userNameField = driver.FindElementById("loginForm:username");
                var userPasswordField = driver.FindElementById("loginForm:password");
                var loginButton = driver.FindElementById("loginForm:loginButton");

                // Type user name and password
                userNameField.SendKeys("username");
                userPasswordField.SendKeys("password");

                // and click the login button
                loginButton.Click();

                driver.Navigate().GoToUrl("www.link2.com");
                driver.Manage().Timeouts().ImplicitWait = System.TimeSpan.FromSeconds(15);

                var reportSearchField = driver.FindElementByClassName("form-control");

                reportSearchField.SendKeys("Company");

                var reportSearchButton = driver.FindElementById("search_filter_button");
                reportSearchButton.Click();

                var reportDownloadButton = driver.FindElementById("company_report_link");
                reportDownloadButton.Click();

EDIT:

enter image description here


EDIT 2:

I am not the sharpest pen on Stackoverflow community yet. I don't understand how to do it with Selenium. I have done it with

        var reportDownloadButton = driver.FindElementById("company_report_link");
        var text = reportDownloadButton.GetAttribute("href");
        // driver.Manage().Timeouts().ImplicitWait = System.TimeSpan.FromSeconds(15);

        WebClient client = new WebClient();
        // Save the file to desktop for debugging
        var desktop = System.Environment.GetFolderPath(System.Environment.SpecialFolder.Desktop);
        string fileName = desktop + "\\myfile.pdf";
        client.DownloadFile(text, fileName);

However web page seems to be a little bit tricky. I am getting

System.Net.WebException: 'The remote server returned an error: (401) Unauthorized.'

Debugger pointing at:

client.DownloadFile(text, fileName);

I think it should really simulate Right click and Save Link As, otherwise this download will not work. Also if I just click on button, it opens PDF in new Chrome tab.


EDIT3:

Should it be like this?

using System.Linq;
using OpenQA.Selenium.Chrome;

namespace WebDriverTest
{
    class Program
    {
        static void Main(string[] args)
        {

    // declare chrome options with prefs
    var options = new ChromeOptionsWithPrefs();
    options.AddArguments("headless"); // we add headless here

    // declare prefs
        options.prefs = new Dictionary<string, object>
        {
            { "download.default_directory", downloadFilePath }
        };

    // declare driver with these options
    //driver = new ChromeDriver(options); we don't need this because we already declare driver below.

            // Initialize the Chrome Driver // chromeOptions
            using (var driver = new ChromeDriver(options))
            {
                // Go to the home page
                driver.Navigate().GoToUrl("www.link.com");
                driver.Manage().Timeouts().ImplicitWait = System.TimeSpan.FromSeconds(15);
                // Get the page elements
                var userNameField = driver.FindElementById("loginForm:username");
                var userPasswordField = driver.FindElementById("loginForm:password");
                var loginButton = driver.FindElementById("loginForm:loginButton");

                // Type user name and password
                userNameField.SendKeys("username");
                userPasswordField.SendKeys("password");

                // and click the login button
                loginButton.Click();

                driver.Navigate().GoToUrl("www.link.com");
                driver.Manage().Timeouts().ImplicitWait = System.TimeSpan.FromSeconds(15);

                var reportSearchField = driver.FindElementByClassName("form-control");

                reportSearchField.SendKeys("company");

                var reportSearchButton = driver.FindElementById("search_filter_button");
                reportSearchButton.Click();

                driver.Manage().Timeouts().ImplicitWait = System.TimeSpan.FromSeconds(15);
                driver.Navigate().GoToUrl("www.link.com");

                // click the link to download
                var reportDownloadButton = driver.FindElementById("company_report_link");
                reportDownloadButton.Click();

                // if clicking does not work, get href attribute and call GoToUrl() -- this may trigger download
                var href = reportDownloadButton.GetAttribute("href");
                driver.Navigate().GoToUrl(href);

                }
            }
        }

    }
}
CEH
  • 5,701
  • 2
  • 16
  • 40
10101
  • 2,232
  • 3
  • 26
  • 66
  • There are multiple ways, but if you use the code I provided in my previous answer, instead of writing ".\" (meaning this folder) you instead of it write your path. So that the download path is "C:\temp\" & FileName (just make sure directory exists before downloading then) – CruleD Nov 28 '19 at 19:36
  • With the error that's getting thrown (`System.Net.WebException`), it seems like you need some credentials in order to send a request to download the file, which might not be possible if the web service does not have an API or an easy way to get credentials. If you just use Selenium to either click or navigate to the download link and set `download.default_directory` ChromeDriver setting, this error should not get thrown. – CEH Nov 30 '19 at 16:47
  • Yes, I am just testing it right now. Seems to be like you are saying. WebClient solution does not seem to work. I get the link and it works fine. However `client.DownloadFile(text, fileName);` is not working. Tested my script for another PDF file and it works. So I have to get your solution working. – 10101 Nov 30 '19 at 17:02
  • `run your code here` just means to run whatever steps of your script are needed to get to the download button -- I think the steps are already in your question, but I wanted to leave the answer concise so that's why I did that. I updated your code sample under `EDIT 3` in your question to fix a few issues -- `ChromeDriver` was getting declared twice, and we can combine `headless` option with `download.default_directory` in `ChromeOptionsWithPrefs` -- you originally had two sets of options getting declared, along with two ChromeDrivers. Your code sample should be test-able now. – CEH Nov 30 '19 at 17:34
  • I have tried something similar before and got an error what I am getting now. Debugger pointing at "options" in `using (var driver = new ChromeDriver(options))`. Saying `Severity Code Description Project File Line Suppression State Error CS1503 Argument 1: cannot convert from 'WebDriverTest.ChromeOptionsWithPrefs' to 'OpenQA.Selenium.Chrome.ChromeOptions' Scraper C:\Users\PC\source\repos\Scraper\Program.cs 28 Active ` – 10101 Nov 30 '19 at 17:42

2 Answers2

2

You can use WebClient.DownloadFile for that.

Longoon12000
  • 774
  • 3
  • 13
  • How to get link as string out of this? `var reportDownloadButton = driver.FindElementById("company_report_link");` – 10101 Nov 28 '19 at 14:34
  • I don't know know what "company_report_link" is but usually you can just access the attribute or innertext where the url is located. – Longoon12000 Nov 28 '19 at 15:32
2

You could try setting the download.default_directory Chrome driver preference:

// declare chrome options with prefs
var options = new ChromeOptionsWithPrefs();

// declare prefs
    options.prefs = new Dictionary<string, object>
    {
        { "download.default_directory", downloadFilePath }
    };

// declare driver with these options
driver = new ChromeDriver(options);


// ... run your code here ...

// click the link to download
var reportDownloadButton = driver.FindElementById("company_report_link");
reportDownloadButton.Click();

// if clicking does not work, get href attribute and call GoToUrl() -- this may trigger download
var href = reportDownloadButton.GetAttribute("href");
driver.Navigate().GoToUrl(href);

If reportDownloadButton is a link that triggers a download, then the file should download to the filePath you have set in download.default_directory.

Neither of these threads are in C#, but they speak of a similar issue:

How to control the download of files with Selenium + Python bindings in Chrome

How to use chrome webdriver in selenium to download files in python?

CEH
  • 5,701
  • 2
  • 16
  • 40
  • Thank you for this one! I will test it tomorrow – 10101 Nov 28 '19 at 16:03
  • Thank you for trying to help me. I just don't understand where part before `// ... run your code here ...` should be? I already have one option for headless. How I can add another option for download – 10101 Nov 30 '19 at 17:13