-1

i try to get all cookies (include cookies that set by javascript) from a website using python. At the moment i try that with the requests module.

The http cookies are not the problem. I can catch them with:

response = requests.get("http://example.com")
http_cookies = response.cookies

The cookies that are set by javascript is a little bit tricky. I search the content from all <script ..>...</script> tags and the integrated javascripts sources <script ... src="..."> on the website with regex.

Now i have a large list with strings that contains all the javascript on the page. I believe that all cookies that set by javascript have the form document.cookie = "...";. Is that right?

I think that i can search in my list with the javascript code for substrings like document.cookie = "..."; with regex.

I'm correct? I am only interested in the cookie name. For example _ga for Google Analytics.

Thanks for your help!

Basti G.
  • 411
  • 1
  • 5
  • 26
  • 1
    "I believe that all cookies that set by javascript have the form `document.cookie = "...";`. Is that right?" Not necessarily. You're going to have to actually _execute_ the JavaScript to do this reliably, and `requests` can't do that. – ChrisGPT was on strike Feb 16 '20 at 14:37
  • Does this answer your question? [Web-scraping JavaScript page with Python](https://stackoverflow.com/questions/8049520/web-scraping-javascript-page-with-python) – ChrisGPT was on strike Feb 16 '20 at 14:38
  • Not really. I have tried that in the past with selenium but with selenium it is necessary to use a proxy and analyse the http headers with this proxy to catch all cookies. This is very nested and needs many time for large websites. And it is difficult to scan websites parallel with this method. And that is why i'm looking for an easier way to do this. If my idea can't work, is there a way to execute my saved javascript after i walk through the whole website and read the document.cookie to get the cookies names with python? – Basti G. Feb 16 '20 at 15:11

1 Answers1

0

As Chris said, it is not possible to search for cookies using that regex because cookies can be set in a multitude of ways not just limited to document.cookie.

I would suggest using Selenium, which imitates a browser, where you can get the cookies by using the following:

import pickle
import selenium.webdriver 

driver = selenium.webdriver.Firefox()
driver.get("http://www.google.com")
pickle.dump( driver.get_cookies() , open("cookies.pkl","wb"))
isopach
  • 1,783
  • 7
  • 31
  • 43