Download CSV from url in Selenium?

Question

I have a URL that I want to regularly download from. It can only be accessed from a logged in account which requires javascript support to do so, and so I'm forced to use Selenium with PhantomJS, trust me. Otherwise, I would just use urllib for this, but it gives me a sign in error.

It's a CSV file, with a maximum of 1000 rows, and about 6 columns.

I want to eventually get this CSV into a list. Does anyone know how to download a CSV with Selenium Webdriver?

Thanks so much.

Edit: I'm just looking to download a CSV from a URL in Selenium. Nothing else.

Can you provide the URL, and some dummy username+password, and then tell where the download button/link is located? — barak manos, Feb 18 '14 at 20:28
BTW, downloading the CSV is pretty easy with Selenium. The main problem is how to handle the 'Save As' popup that the browser usually generates. — barak manos, Feb 18 '14 at 20:30
The platform I'm using has nothing to do with the question. I just need a generic way to download a CSV from a URL in Selenium. Yes, after opening the URL, a Save As dialog appears. If you want a URL to test on, here's one: http://winterolympicsmedals.com/medals.csv — User, Feb 18 '14 at 20:32
Not a duplicate, that one is in Java, and can't really understand it. — User, Feb 18 '14 at 20:55
@rvraghav93 The accepted answer for that post doesn't really answer the question and the linked blog post isn't very helpful either. — Uyghur Lives Matter, Feb 18 '14 at 20:57
What is the problem with `webdriver.get(url)`? The "Save As" dialogue popup, or some other issue? — GVH, Feb 18 '14 at 21:09
@cpburnz ya thats true ... sorry for the mistake ... that code doesnt work either ! — Raghav RV, Feb 18 '14 at 21:09
Is it possible to get the session id and cookies and the pass it to urllib / requests for downloading the file? for instance `browser.session_id` and `browser.get_cookies()` yield the session id and cookies... these can be passed to requests. Is is possible to do it that way ? — Raghav RV, Feb 18 '14 at 21:51
That sounds like it may work. I know nothing about cookies. Does anyone else know? — User, Feb 18 '14 at 23:51

score 1 · Accepted Answer · edited May 23 '17 at 12:23

1

its actually pretty simple. Using another answer I gave in stack over flow

https://stackoverflow.com/a/21871600/2423379

EDIT: Running Firefox in Headless mode

Requirements:

sudo apt-get install xvfb (or equivalent command in ur distro)
pip install --user xvfbwrapper

And code part

from xvfbwrapper import Xvfb

vdisplay = Xvfb()
vdisplay.start()

# launch stuff inside virtual display here

vdisplay.stop()

Ref: Firefox-selenium in headless mode

edited May 23 '17 at 12:23

Community

1
1

answered Feb 19 '14 at 05:12

goofd

2,028
2
21
33

Thanks, however I'm using PhantomJS instead of Firefox, as I need it to be headerless. – User Feb 19 '14 at 17:00
1

I had a similar requirement. However I was not able to do the csv download with PhantomJS. But the bright side is that you can use Firefox is headless mode..that's what I am doing right now..I have updated my answer to reflect that. – goofd Feb 20 '14 at 03:58

score 1 · Answer 2 · answered Jul 02 '14 at 00:19

How about page_source attribute?

browser.get("http://winterolympicsmedals.com/medals.csv")
csv_file = browser.page_source
print(csv_file)

Try this, my friend. I use Selenium + Python + HTMLUnit, working like a breeze.

Hope it works for your PhantomJS

Download CSV from url in Selenium?

2 Answers2