I'm not new to programming—but am (very) new to web-scraping. I'd like to get data from this website in this manner:
- Get the team-data from the given URL and store it in some text file.
- "Click" the links of each of the team members and store that data in some other text file.
- Click various other specific links and store data in its own separate text file.
Again, I'm quite new to this. I have tried opening the specified website with urllib2 (in hopes of being able to parse it with BeautifulSoup), but opening it resulted in a timeout.
Ultimately, I'd like to do something like specify a team's URL to a script, and have said script update associated text files of the team, its players, and various other things in different links.
Considering what I want to do, would it be better to learn how to create a web-crawler, or directly do things via urllib2? I'm under the impression that a spider is faster, but will basically click on links at random unless told to do otherwise (I do not know whether or not this impression is accurate).