94

The following code:

var text = (new WebClient()).DownloadString("http://export.arxiv.org/api/query?search_query=au:Freidel_L*&start=0&max_results=20"));

results in a variable text that contains, among many other things, the string

"$κ$-Minkowski space, scalar field, and the issue of Lorentz invariance"

However, when I visit that URL in Firefox, I get

$κ$-Minkowski space, scalar field, and the issue of Lorentz invariance

which is actually correct. I also tried

var data = (new WebClient()).DownloadData("http://export.arxiv.org/api/query?search_query=au:Freidel_L*&start=0&max_results=20");
var text = System.Text.UTF8Encoding.Default.GetString(data);

but this gave the same problem.

I'm not sure where the fault lies here. Is the feed lying about being UTF8-encoded, and the browser is smart enough to figure that out, but not WebClient? Is the feed properly UTF8-encoded, but WebClient is failing in some other way? What can I do to mitigate this?

Domenic
  • 110,262
  • 41
  • 219
  • 271
  • 7
    `UTF8Encoding.Default` is actually `Encoding.Default`, which is ANSI encoding based by the OS language settings. – svick Aug 21 '11 at 13:24
  • Possible duplicate of [WebClient.DownloadString() returns string with peculiar characters](https://stackoverflow.com/questions/4716470/webclient-downloadstring-returns-string-with-peculiar-characters) – Brian Sep 19 '18 at 21:40

1 Answers1

220

It's not lying. You should set the webclient's encoding first before calling DownloadString.

using(WebClient webClient = new WebClient())
{
webClient.Encoding = Encoding.UTF8;
string s = webClient.DownloadString("http://export.arxiv.org/api/query?search_query=au:Freidel_L*&start=0&max_results=20");
}

As for why your alternative isn't working, it's because the usage is incorrect. Its should be:

System.Text.Encoding.UTF8.GetString()
Konamiman
  • 49,681
  • 17
  • 108
  • 138
LostInComputer
  • 15,188
  • 4
  • 41
  • 49
  • 9
    Excellent, thank you! Strange that the `WebClient` doesn't use the headers to detect this, but this works perfectly, and between you and @svick, I understand why the other thing I tried was failing miserably as well. – Domenic Aug 21 '11 at 20:17
  • 1
    Works for `UploadString` as well – irfandar Oct 25 '17 at 13:57