4

I have a URL that I want to regularly download from. It can only be accessed from a logged in account which requires javascript support to do so, and so I'm forced to use Selenium with PhantomJS, trust me. Otherwise, I would just use urllib for this, but it gives me a sign in error.

It's a CSV file, with a maximum of 1000 rows, and about 6 columns.

I want to eventually get this CSV into a list. Does anyone know how to download a CSV with Selenium Webdriver?

Thanks so much.

Edit: I'm just looking to download a CSV from a URL in Selenium. Nothing else.

User
  • 23,729
  • 38
  • 124
  • 207
  • Can you provide the URL, and some dummy username+password, and then tell where the download button/link is located? – barak manos Feb 18 '14 at 20:28
  • BTW, downloading the CSV is pretty easy with Selenium. The main problem is how to handle the 'Save As' popup that the browser usually generates. – barak manos Feb 18 '14 at 20:30
  • The platform I'm using has nothing to do with the question. I just need a generic way to download a CSV from a URL in Selenium. Yes, after opening the URL, a Save As dialog appears. If you want a URL to test on, here's one: http://winterolympicsmedals.com/medals.csv – User Feb 18 '14 at 20:32
  • Not a duplicate, that one is in Java, and can't really understand it. – User Feb 18 '14 at 20:55
  • 2
    @rvraghav93 The accepted answer for that post doesn't really answer the question and the linked blog post isn't very helpful either. – Uyghur Lives Matter Feb 18 '14 at 20:57
  • What is the problem with `webdriver.get(url)`? The "Save As" dialogue popup, or some other issue? – GVH Feb 18 '14 at 21:09
  • @cpburnz ya thats true ... sorry for the mistake ... that code doesnt work either ! – Raghav RV Feb 18 '14 at 21:09
  • Is it possible to get the session id and cookies and the pass it to urllib / requests for downloading the file? for instance `browser.session_id` and `browser.get_cookies()` yield the session id and cookies... these can be passed to requests. Is is possible to do it that way ? – Raghav RV Feb 18 '14 at 21:51
  • That sounds like it may work. I know nothing about cookies. Does anyone else know? – User Feb 18 '14 at 23:51
  • I would like to get help with this as well – user1357015 May 06 '14 at 02:20

2 Answers2

1

its actually pretty simple. Using another answer I gave in stack over flow

https://stackoverflow.com/a/21871600/2423379

EDIT: Running Firefox in Headless mode

Requirements:

  • sudo apt-get install xvfb (or equivalent command in ur distro)
  • pip install --user xvfbwrapper

And code part

from xvfbwrapper import Xvfb

vdisplay = Xvfb()
vdisplay.start()

# launch stuff inside virtual display here

vdisplay.stop()

Ref: Firefox-selenium in headless mode

Community
  • 1
  • 1
goofd
  • 2,028
  • 2
  • 21
  • 33
  • Thanks, however I'm using PhantomJS instead of Firefox, as I need it to be headerless. – User Feb 19 '14 at 17:00
  • 1
    I had a similar requirement. However I was not able to do the csv download with PhantomJS. But the bright side is that you can use Firefox is headless mode..that's what I am doing right now..I have updated my answer to reflect that. – goofd Feb 20 '14 at 03:58
1

How about page_source attribute?

browser.get("http://winterolympicsmedals.com/medals.csv")
csv_file = browser.page_source
print(csv_file)

Try this, my friend. I use Selenium + Python + HTMLUnit, working like a breeze.

Hope it works for your PhantomJS