3

The robots.txt in yahoo robots.txt say:

User-agent: *
Sitemap: https://finance.yahoo.com/sitemap_en-us_desktop_index.xml
Sitemap: https://finance.yahoo.com/sitemaps/finance-sitemap_index_US_en-US.xml.gz
Disallow: /r/
Disallow: /__rapidworker-1.2.js
Disallow: /__blank
Disallow: /_td_api
Disallow: /_remote

Does yahoo finance ban web scrapy or not?
What was disallowed by yahoo finance website?
What we can infer from yahoo's robots.txt file?

Derek Brown
  • 4,232
  • 4
  • 27
  • 44
showkey
  • 482
  • 42
  • 140
  • 295

2 Answers2

2

Nothing in the robots.txt file expressly prevents you from scraping Yahoo Finance, however Yahoo finance is governed by Yahoo's Terms of Service.

The most pertinent part of this document says basically that you should not do anything which would interfere with their services. Realistically, this means that if you are planning on scraping Yahoo Finance for data, you should do so responsibly (not many thousands of requests, as this will quickly get you banned).

That said, web scraping is generally inefficient (as you are reloading an entire HTML page just to collect data programmatically). I would look into using an API instead (like those discussed here), as this will be a) more reliable b) faster and c) definitely be legal.

Derek Brown
  • 4,232
  • 4
  • 27
  • 44
1

They don't disallow it but my scraper gets hundreds of companies every 30 seconds and ever since, their website has kept changing formats. Also I noticed something new, they actually in fact will block your router IP for a little bit by replacing some of the variables with N/A and misinforming your program, so they don't state that they disallow it but they definitely don't like you doing it. So all im saying is be sneaky.