0

I'm getting stuck at a webscraping project, I would like to webscrape the following website and the dates for each of the reviews. However I get 'January 1970' for all of the dates. https://fairygodboss.com/company-reviews/ebay-inc

Here is my code:

page_link = 'https://fairygodboss.com/company-reviews/ebay-inc' # for work/life balance for EBAY
page_response = requests.get(page_link, verify=False, headers={'User-Agent': randomUserAgents()})
soup = BeautifulSoup(page_response.content, 'html.parser')
soup.find_all(class_='textColor6 w-700 p-b-10')

Many thanks!

sammtt
  • 401
  • 1
  • 6
  • 14

1 Answers1

1

I believe your problem is that, when you make your request, you are not logged in. When a user is not logged in, all the dates appear as January 1970, until you are redirected to a login page. You will first have to log in.

This can be a tricky problem, but there is a library for python called twill that may work for you: http://twill.idyll.org

Alternatively, you could use something like the Mechanize library, which twill is based on.

This StackOverflow question should help you out: How to scrape a website that requires login first with Python

Caleb H.
  • 1,657
  • 1
  • 10
  • 31
  • I've found that requests + sessions is the right tool for this job. Python mechanize is abandoned and I've never heard of twill. – pguardiario Nov 16 '18 at 23:45
  • I've logged in using requests + session but it still only shows me January 1970 – sammtt Nov 17 '18 at 22:28
  • To help with that I'd have to see your code again, as well as what the website looks like while logged in. – Caleb H. Nov 19 '18 at 15:28