5

In my "ViewController.swift", I have a localized string:

TheOutLabel.text = NSLocalizedString("hello", comment: "The \"hello\" word")

In the Terminal, to generate the "Localizable.strings" file, I typed:

cd Base.lproj/; genstrings ../*.swift; cat Localizable.strings

and got the following result:

??/* The \"hello\" word */
"hello" = "hello";

When typing od -c Localizable.strings, I get:

0000000  377 376   /  \0   *  \0      \0   T  \0   h  \0   e  \0      \0
0000020    \  \0   "  \0   h  \0   e  \0   l  \0   l  \0   o  \0   \  \0
0000040    "  \0      \0   w  \0   o  \0   r  \0   d  \0      \0   *  \0
0000060    /  \0  \n  \0   "  \0   h  \0   e  \0   l  \0   l  \0   o  \0
0000100    "  \0      \0   =  \0      \0   "  \0   h  \0   e  \0   l  \0
0000120    l  \0   o  \0   "  \0   ;  \0  \n  \0  \n  \0                

When I type file Localizable.strings, it says:

Localizable.strings: Little-endian UTF-16 Unicode c program text

When I open the file with "emacs", it does not display these characters, and when I type M-x describe-current-coding-system RET, it says:

Coding system for saving this buffer:
  U -- utf-16le-with-signature-unix (alias: utf-16-le-unix)

So, it seems that these octal characters \377 and \376 at the beginning of the file look like kind of a utf-16-le BOM, which explains why each character is followed by a \0 (UTF-16 is twice bigger than UTF-8 in this case).

Is this normal/useful/harmful?

Also, the standard *nix tools (grep, sed, awk) don't handle nicely utf-16 files:

grep '=' Localizable.strings 
Binary file Localizable.strings matches

grep -a '=' Localizable.strings | sed -e 's/ = //'
"hello" = "hello";

Also, I edited Localizable.strings to replace "hello"; by "Hello";. Then "SourceTree" (my "git" client) is unable to display the difference unless I do, as proposed in Can I make git recognize a UTF-16 file as text?:

echo '*.strings diff=localizablestrings' > .../.git/../.gitattributes
echo '[diff "localizablestrings"]' >> .../.git/config
echo '  textconv = "iconv -f utf-16 -t utf-8"' >> .../.git/config

Apple's Internationalization and Localization Guide says:

Note: If Xcode warns you that the Localizable.strings file appears to be Unicode (UtF-16), you can convert it to Unicode (UTF-8) using the File inspector.

So, should I remove / ignore the BOM?

It seems there is no genstrings option to generate an UTF-8 file.

Should I convert the file?

Community
  • 1
  • 1
duthen
  • 848
  • 6
  • 14
  • 1
    Yes, that is a UTF-16 encoded file with BOM, and according to http://www.stevestreeting.com/2010/05/18/os-x-localisation-incremental-genstrings-and-utf-8-files/ and other resources, genstrings has no option to alter this. Xcode can handle that without problems. If you need to work with other tools then you can convert it to UTF-8 (e.g. in the File Inspector in Xcode). – Martin R May 31 '16 at 09:58

1 Answers1

1

The genstrings tool is hard-coded to output strings files in the encoding "UTF-16LE with BOM". I prefer to keep my strings files in UTF-8 and I use the following shell script to generate them:

#!/bin/zsh
function convert {
    for file in $@; do
        print "Converting $file to UTF-8"
        iconv -f utf-16 -t utf-8 $file > temp   
        rm $file; mv temp $file
    done
}
genstrings -o en.lproj *.m
convert en.lproj/*.strings
Nick Moore
  • 15,547
  • 6
  • 61
  • 83
  • 1
    Hello ! Thanks for the useful answer ! Just in case any file name may contain special characters (space tab return), it's safer to _always_ enclose the variables within quotes: `for file in "$@"; do iconv -f utf-16 -t utf-8 "$file" > temp; rm "$file"; mv temp "$file"; done` – duthen Apr 25 '22 at 17:20
  • That would be the case if running this as a bash script. However being zsh, the paths do not need quoting since zsh variables do not split on whitespace. Appreciate the comment though. – Nick Moore Apr 26 '22 at 20:05