3

Is there a way to "auto detect" the encoding of a resource when loading it using stringFromContentsOfURL? The current (non-depracated) method, + (id)stringWithContentsOfURL:(NSURL *)url encoding:(NSStringEncoding)enc error:(NSError **)error;, wants a URL encoding. I've noticed that getting it wrong does make a difference for what I want to do. Is there a way to check this somehow and always get it right? (Right now I'm using UTF8.)

Moshe
  • 57,511
  • 78
  • 272
  • 425

2 Answers2

4

I'd try this function from the docs

Returns a string created by reading data from a given URL and returns by reference the encoding used to interpret the data.

+ (id)stringWithContentsOfURL:(NSURL *)url usedEncoding:(NSStringEncoding *)enc error:(NSError **)error

this seems to guess the encoding and then returns it to you

ThomasW
  • 16,981
  • 4
  • 79
  • 106
Dave.B
  • 6,632
  • 1
  • 19
  • 20
2

What I normally do when converting data (encoding-less string of bytes) to a string is attempt to initialize the string using various different encodings. I would suggest trying the most limiting (charset wise) encodings like ASCII and UTF-8 first, then attempt UTF-16. If none of those are a valid encoding, you should attempt to decode the string using a fallback encoding like NSWindowsCP1252StringEncoding that will almost always work. In order to do this you need to download the page's contents using NSData so that you don't have to re-download for every encoding attempt. Your code might look like this:

NSData * urlData = [NSData dataWithContentsOfURL:aURL];
NSString * theString = [[NSString alloc] initWithData:urlData encoding:NSASCIIStringEncoding];
if (!theString) {
    theString = [[NSString alloc] initWithData:urlData encoding:NSUTF8StringEncoding];
}
if (!theString) {
    theString = [[NSString alloc] initWithData:urlData encoding:NSUTF16StringEncoding];
}
if (!theString) {
    theString = [[NSString alloc] initWithData:urlData NSWindowsCP1252StringEncoding];
}
// ...
// use theString here...
// ...
[theString release];
Alex Nichol
  • 7,512
  • 4
  • 32
  • 30