4

How can I convert Plain Text (.txt) files to a string if the encoding type is unknown?

I'm working on a feature that would allow users to import txt files into my app. This means the file could have been created in any number of apps, utilizing any of a variety of encodings that would be considered valid for a plain text file. My understanding is this could include (ASCII, UTF-8, UTF-16, UTF-16BE, UTF-16LE, UTF-32, UTF-32BE, UTF-32LE, or EBCDIC?!)

Things had been going well using the following:

NSString *txtFileAsString = [NSString stringWithContentsOfFile:path encoding:NSUTF8StringEncoding error:&errorReading];

Then a user supplied a file that resulted in empty content when imported. I watched the file in XCode debug, and see a Cocoa error 261, NSStringEncoding=4.

What I know:

  • The user supplied file was created with an app called knowtes
  • The file opens with TextEdit, TextWranger, etc. on Mac OS X
  • The file contains "special characters" such as umlauts (rant: why doesn't the "u" on umlaut have an umlaut?!)
  • Finder Info displays:

Kind: text

text/plain; charset=utf-16le

I am guessing that the utf-16le encoding of the file is the key, as I'm expecting a NSUTF8 file. I attempted to use ASCII as a lowest common denominator. It didn't crash, but fudged in some characters that weren't present in the original file.

NSString *txtFileAsString = [NSString stringWithContentsOfFile:path encoding:NSASCIIStringEncoding error:&errorReading];

So I attempted to convert the file to NSData first, hoping it might negate the need to recognize the encoding. It did not work.

    NSData *txtFileData = [NSData dataWithContentsOfFile:path];
    NSString *txtFileAsString = [[NSString alloc]initWithData:txtFileData encoding:NSUTF8StringEncoding];

This leads me to a few questions:

  1. Is there not a universal way to convert Plain Text file contents, regardless of encoding, to a string (i.e. lowest common denominator)? I believe that used to be the purpose initWithContentsOfFile , which unfortunately is now deprecated. ASCIStringEncoding didn't work.
  2. Is there anything about converting an NSUTF16 encoded file to a string that I would need to handle differently than if it were NSUTF8?
  3. Assuming the file is in fact URF16LE, why does the following suggestion not work either?

    NSString *txtFileAsString = nil;
    if (path !=nil) {
      NSData *txtFileData = [NSData dataWithContentsOfFile:path];
      NSString *txtFileAsString = [[NSString alloc]initWithData:txtFileData encoding:NSASCIIStringEncoding];
    if (!txtFileAsString) {
      txtFileAsString = [[NSString alloc] initWithData:txtFileData encoding:NSUTF8StringEncoding];
    }
    if (!txtFileAsString) {
      txtFileAsString = [[NSString alloc] initWithData:txtFileData encoding:NSUTF16StringEncoding];
    }
    if (!txtFileAsString) {
      txtFileAsString = [[NSString alloc] initWithData:txtFileData encoding:NSUTF16LittleEndianStringEncoding];
    }
    if (!txtFileAsString) {
      txtFileAsString = [[NSString alloc] initWithData:txtFileData encoding:NSUTF16BigEndianStringEncoding];
    }
    if (!txtFileAsString) {
      txtFileAsString = [[NSString alloc] initWithData:txtFileData encoding:NSUTF32StringEncoding];
    }
    if (!txtFileAsString) {
      txtFileAsString = [[NSString alloc] initWithData:txtFileData encoding:NSUTF32LittleEndianStringEncoding];
    }
    if (!txtFileAsString) {
      txtFileAsString = [[NSString alloc] initWithData:txtFileData encoding:NSUTF32BigEndianStringEncoding];
    }}
    
Community
  • 1
  • 1
DenVog
  • 4,226
  • 3
  • 43
  • 72

1 Answers1

3

Sometimes stringWithContentsOfFile:usedEncoding:error: can do the job (esp if the file has a Byte Order Mark):

NSError *error;
NSStringEncoding encoding;
NSString *string = [NSString stringWithContentsOfFile:path usedEncoding:&encoding error:&error];

Note, this rendition with usedEncoding should not be confused with the similarly named method that just has a encoding parameter.

Rob
  • 415,655
  • 72
  • 787
  • 1,044
  • Stellar! It seems so simple when you put it that way. :P Why in the heck doesn't XCode reference this method in the deprecation statement for initWithContentsOfFile?! – DenVog Jul 15 '15 at 23:14
  • It *is* in the `NSString` documentation: `+ (instancetype nullable)stringWithContentsOfFile:(NSString * nonnull)path usedEncoding:(NSStringEncoding * nullable)enc error:(NSError * nullable * nullable)error` "Upon return, if the file is read successfully, contains the encoding used to interpret the file at path." If you are having trouble finding things in the Apple documentation consider using the Dash app, it is the same Apple documentation, more usable. – zaph Jul 16 '15 at 01:04
  • @Rob I got an error when opening a file as per above code - "The file couldn’t be opened because the text encoding of its contents can’t be determined" In Terminal - the encoding of the file is - "application/octet-stream; charset=binary" – KamyFC Sep 09 '20 at 05:02