0

I'm working on an ASP.net Web Forms application.

My application sends a POST request to an external domain with search criteria from the user (imitating submitting search terms in a search form), and then runs a regular expression on the response HTML to find and extract the number of results. I've done some research into better alternatives, but since there's no API and no way to do this with GET from the client, server-side requests seems like my only option.

The problem is because the response contains the entire web page, it's bandwidth intensive, and for many searches this adds up very quickly.

Given that I'm only looking for a small sliver of data and basically everything else is garbage, are there any ways to reduce the amount of information received in a web response?

UpQuark
  • 791
  • 1
  • 11
  • 35
  • You shouldn't be using regular expressions to parse HTML, which is not a "regular language". See [RegEx match open tags except XHTML self-contained tags](http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags). – John Saunders Aug 16 '13 at 19:41
  • That part seemed ugly but generally works fine. I had tried searching by xpath but couldn't get that functioning properly for unclear reasons. Is searching by xpath a generally acceptable method? – UpQuark Aug 16 '13 at 19:47
  • If we could get what we wanted with simple get/post requests there would be no need for domains to write an APIs for their data. The only way to do this is for the domain to write such an API for what you want. Otherwise the postback is giving you exactly what it's meant to. – jamesSampica Aug 16 '13 at 20:05

1 Answers1

1

Converted to answer for visibility.

If we could get what we wanted with simple get/post requests there would be no need for domains to write web APIs. The only way to do this is for the domain to write such an API for what you want. Otherwise the postback is giving you exactly what it's meant to.

jamesSampica
  • 12,230
  • 3
  • 63
  • 85