I need to create a tool that can log in to a website, read the HTML, perhaps navigate to another page, and ultimately pull data down from the page (and export it to a file, or keep it "in memory" for more processing, etc). I will be doing this on Mac OS. Are there any suggestions on how best to do this these days? In the past I have done this in .NET or Java using the web libraries. I need to be able to login to a site that uses HTTPS.
Asked
Active
Viewed 325 times
-2
-
2It's not clear what you are asking for. Are you looking for a general-purpose API that essentially performs the same functions as Mint.com? That's non-trivial -- why don't you use an existing off-the-shelf product (such as the aforementioned Mint)? – Kirk Woll May 22 '12 at 17:14
-
1I'm going to have to second Kirk's comment. There's no "tool" out there that does this, except Mint (only accessing, which it does great) or possibly the quickbooks derivations (which do have excel export). Intuit makes both. – Nick Martin May 22 '12 at 17:19
-
There are plenty of tools that can do this-- screen scraper tools, web crawler tools, Microsoft .NET etc. I have done this in the past for financial institutions that want to log into websites and grab real time quotes, etc (before APIs existed that could do this). My knowledge of tools is old (about 7 years old) -- so looking for the current recommendations for automating something to log into a secure website, process a page and save part of it to a file. – BestPractices May 22 '12 at 19:12
2 Answers
1
Take a look at HTTPUnit. It's an easy solution for emulating a web browser using Java code and you'll be up and running fairly quickly if you're already familiar with Java and it does support HTTPS

Brad
- 15,186
- 11
- 60
- 74
1
I did some pretty heavy OSX screen scraping with .NET/Mono and Html Agility Pack. Both work well IMO.

kenny
- 21,522
- 8
- 49
- 87