0

Is it possible to write a program to grab the online search result?

Specifically, I want the data from http://portal.uspto.gov/external/portal/pair

sample data are application numbers, for example 9078871,10595401

Yes, they have CAPTCHAs, and I'm willing to type those in by hand. The problem is I have more than 500 application numbers, what shall I do? Are there any easier way for me to do this?

Thanks in advance! Also, the search engine seems to be written in javascript, but I am not exactly sure.

Leslie G
  • 309
  • 2
  • 10
  • 2
    This is known is "scraping". You may search for "python scraping" or refer to something like: http://stackoverflow.com/questions/2081586/web-scraping-with-python – Alec Smart Dec 26 '11 at 10:06

1 Answers1

0

Sure it is possible and why should it not.

I do not know your gap in knowledge that would enable you to archieve this task as you didn't pointed that out.

Step by Step...

  1. Analyze the Website' s code to see how links and content are generated.
  2. Download the source code programaticly
  3. Generate the hyperlinks to your search results
  4. Parse the related data (I have always done this with some ugly regular expressions)

I have digged a little bit in the site you mentioned and what really can be said is that it won't be a 1-hour action as it's writte in Java (JSP; Java Server Pages).

What I so far found out is that you first have to write an equivalent of the function getDossier or use a Webbrowser control that enables you to call javascript manually to get the search results. Then you can simply bake some regular expressions together to parse the data out of the table.

Mythli
  • 5,995
  • 2
  • 24
  • 31