35

When I calling site www.livescore.com by HttpClient class I always getting error "500". Probably server blocked request from HttpClients.

1)There is any other method to get html from webpage?

2)How I can set the headers to get html content?

When I set headers like in browser I always get stange encoded content.

    http_client.DefaultRequestHeaders.TryAddWithoutValidation("Accept", "text/html,application/xhtml+xml,application/xml");
    http_client.DefaultRequestHeaders.TryAddWithoutValidation("Accept-Encoding", "gzip, deflate");
    http_client.DefaultRequestHeaders.TryAddWithoutValidation("User-Agent", "Mozilla/5.0 (Windows NT 6.2; WOW64; rv:19.0) Gecko/20100101 Firefox/19.0");
    http_client.DefaultRequestHeaders.TryAddWithoutValidation("Accept-Charset", "ISO-8859-1");

3) How I can slove this problem? Any suggestions?

I using Windows 8 Metro Style App in C# and HttpClientClass

Norbert Pisz
  • 3,392
  • 3
  • 27
  • 42

4 Answers4

69

Here you go - note you have to decompress the gzip encoded-result you get back as per mleroy:

private static readonly HttpClient _HttpClient = new HttpClient();

private static async Task<string> GetResponse(string url)
{
    using (var request = new HttpRequestMessage(HttpMethod.Get, new Uri(url)))
    {
        request.Headers.TryAddWithoutValidation("Accept", "text/html,application/xhtml+xml,application/xml");
        request.Headers.TryAddWithoutValidation("Accept-Encoding", "gzip, deflate");
        request.Headers.TryAddWithoutValidation("User-Agent", "Mozilla/5.0 (Windows NT 6.2; WOW64; rv:19.0) Gecko/20100101 Firefox/19.0");
        request.Headers.TryAddWithoutValidation("Accept-Charset", "ISO-8859-1");

        using (var response = await _HttpClient.SendAsync(request).ConfigureAwait(false))
        {
            response.EnsureSuccessStatusCode();
            using (var responseStream = await response.Content.ReadAsStreamAsync().ConfigureAwait(false))
            using (var decompressedStream = new GZipStream(responseStream, CompressionMode.Decompress))
            using (var streamReader = new StreamReader(decompressedStream))
            {
                return await streamReader.ReadToEndAsync().ConfigureAwait(false);
            }
        }
    }
}

call such like:

var response = await GetResponse("http://www.livescore.com/").ConfigureAwait(false); // or var response = GetResponse("http://www.livescore.com/").Result;
Jesse C. Slicer
  • 19,901
  • 3
  • 68
  • 87
  • Is it possible to accomplish the same effect without the "Accept-Encoding" header? – pim Mar 24 '16 at 01:49
26

Could try this as well to add compression support:

var compressclient = new HttpClient(new HttpClientHandler() 
{ 
AutomaticDecompression = DecompressionMethods.Deflate | DecompressionMethods.GZip 
}); 

This adds the headers too.

According to the same thread support is now in Windows Store framework: http://social.msdn.microsoft.com/Forums/windowsapps/en-US/429bb65c-5f6b-42e0-840b-1f1ea3626a42/httpclient-data-compression-and-caching?prof=required

user3285954
  • 4,499
  • 2
  • 27
  • 19
5

Several things to take note of.

  1. That site requires you to provide a user agent, or it returns a 500 HTTP error.

  2. A GET request to livescore.com responds with a 302 to livescore.us. You need to handle the redirection or directly request livescore.us

  3. You need to decompress a gzip-compressed response

This code works using the .NET 4 Client Profile, I'll let you figure out if it fits a Windows Store app.

var request = (HttpWebRequest)HttpWebRequest.Create("http://www.livescore.com");
request.AllowAutoRedirect = true;
request.UserAgent = "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.17 (KHTML, like Gecko) Chrome/24.0.1312.57 Safari/537.17";

string content;

using (var response = (HttpWebResponse)request.GetResponse())
using (var decompressedStream = new GZipStream(response.GetResponseStream(), CompressionMode.Decompress))
using (var streamReader = new StreamReader(decompressedStream))
{
    content = streamReader.ReadToEnd();
}
siger
  • 3,112
  • 1
  • 27
  • 42
1

I think you can be pretty certain that they have done everything to stop developers from screen-scraping.

If I try from a standard C# project using this code :

  var request = WebRequest.Create("http://www.livescore.com ");
  var response = request.GetResponse();

I get this response:

The remote server returned an error: (403) Forbidden.
markoo
  • 708
  • 1
  • 6
  • 22
  • 2
    Yes I know :) But we are developers and we need to slove problems like this :) – Norbert Pisz Feb 22 '13 at 15:15
  • There are paid services out there. This is illegal hacking. Maybe you should find another site. – markoo Feb 22 '13 at 15:17
  • 4
    Illegal? Why? When You call this site by browser is illegall too? – Norbert Pisz Feb 22 '13 at 15:18
  • Livescores is big business, most sites have syndication deals or xml feeds that you can pay for. – markoo Feb 22 '13 at 15:23
  • Duplicating someone elses data without their permission is often deemed illegal though most sites also cover this in their Terms of service or a usage policy of which I can't find neither on livescore.com. I would still suggest that you contact the site and ask for permission for your project and if they have any APIs/feeds of their own you can use. – Karl-Johan Sjögren Feb 22 '13 at 16:02