3

I'd like to use the Unicode version of '<' in an NSString but the compiler produces the error:

"Character '<' cannot be specified by a universal character name"

when I use:

NSString *text = @"Some Text: \u003C"; 

'<' seems to be a special character, as well as "=" and a few others so what's a way to insert a literal '<' inside the string, without literally using '<' like so: "some string <"?

I don't have control over the string value itself and the above value as in-line is for demonstration purpose.

Willam Hill
  • 1,572
  • 1
  • 17
  • 28
  • 3
    Well, `NSString *text = @"Some Text: >";`? –  Jun 17 '13 at 19:25
  • @H2CO3 Maybe Will needs a framework for converting Unicode Codepoints into regular characters – Rey Gonzales Jun 17 '13 at 19:27
  • I don't have control over the text value - so < won't work. – Willam Hill Jun 17 '13 at 19:27
  • @Will What do you mean by "not having control over the text value"? –  Jun 17 '13 at 19:28
  • The value is coming in from a text file that will have unicode values embedded with standard English text. Ex: Hello \u003C – Willam Hill Jun 17 '13 at 19:30
  • 1
    Weird. I just tried and a similar errors appears for many different values such as `\u0020` (space) or `\u0041\` (A). – rmaddy Jun 17 '13 at 19:31
  • 5
    @rmaddy - Yes it is weird, but also defined that way in the Standard "A universal character name shall not specify a character whose short identifier is less than 00A0 other than 0024 ($), 0040 (@), or 0060 (‘), nor one in the range D800 through DFFF inclusive." The Standard doesn't say *why* they came up with this restriction. – CRD Jun 17 '13 at 20:20
  • @CRD any webpage you can refer ? to read more about this. This concept is still going over my head – nr5 Jun 05 '14 at 04:47
  • 1
    @114100웃 - I quoted the C11 ISO Standard, section 6.4.3 Universal character names, paragraph 2 Constraints. If you search for N1570 you should find PDF copies of the (draft) Standard on the web - or you can buy a PDF of the final version from ISO. The paragraph has the footnote: "The disallowed characters are the characters in the basic character set and the code positions reserved by ISO/IEC 10646 for control characters, the character DELETE, and the S-zone (reserved for use by UTF−16)." but that doesn't explain the *reasoning* for disallowing hex forms of characters that can be typed. HTH. – CRD Jun 05 '14 at 08:38

3 Answers3

12

I don't believe that the compiler error was addressed.

In response to the errors:

"Character '<' cannot be specified by a universal character name"
"Universal character name refers to a control character"

it appears that you cannot use the \U000000xx literal syntax for many 2 bytes ASCII characters, with the following exceptions:

  • \U00000024
  • \U00000040
  • \U00000060
  • \U000000A0 to \U000000FF

A simple workaround is to use [NSString stringWithFormat:@"%C", 0x000000xx]

Example with the '<' character:

NSString *text = [NSString stringWithFormat:@"Hello %C", 0x003C]";

See xcode UTF-8 literals for more options.

Iulian Onofrei
  • 9,188
  • 10
  • 67
  • 113
SwiftArchitect
  • 47,376
  • 28
  • 140
  • 179
4

If the string has been read from a text file containing "Hello \u003C" with a verbatim backslash then you would have

NSString *text = @"Hello \\u003C";

If the text file contains only ASCII characters then you can use the fact that NSNonLossyASCIIStringEncoding decodes "\uNNNN" to the corresponding Unicode character:

NSData *data = [text dataUsingEncoding:NSASCIIStringEncoding];
NSString *converted = [[NSString alloc] initWithData:data encoding:NSNonLossyASCIIStringEncoding];

Added: You probably can create the string directly from the file with

NSString *text = [NSString stringWithContentsOfFile:pathToFile encoding:NSNonLossyASCIIStringEncoding error:NULL];

and all the Unicode escape sequences are already properly converted.

Martin R
  • 529,903
  • 94
  • 1,240
  • 1,382
  • I'll try this - any explanation on why I can't put the string directly in an .m file, as demonstrated in the sample code? – Willam Hill Jun 17 '13 at 20:19
  • @Will: As I understand your comments to the question, the text file contains a backslash character. To put a backslash character into a literal string or NSString, it has to be escaped as "\\". – Martin R Jun 17 '13 at 20:23
  • 1
    I guess I don't understand why I can use some unicode characters but not all inside an NSString without escaping it. This is accepted by clang: @"Hello \u02C4 \u0502 \u0024" whereas this is not: @"Hello \u02C4 \u0502 \u0024 \u003C"; – Willam Hill Jun 19 '13 at 16:51
  • @Will: See CRD's comment to the question. - But you said that you read the string from a text file, so you don't have a literal string anyway. – Martin R Jun 19 '13 at 17:20
  • I did but because the code version gives warnings, it has piqued my curiosity and if at some point, I want to define it inline, I'd like to understand the problem better. – Willam Hill Jun 19 '13 at 17:23
  • @Will: `@"\u0024"` is *identical* to `@"$"` (one character). As CRD said, `"\uNNNN"` is allowed by the standard only for certain characters, and we don't know why. - But if you have a file containing the characters "\u0024" and read that into an NSString, then the NSString would be `@"\\u0024"` (5 characters, the backslash is escaped). Let me know if you need more information. – Martin R Jun 19 '13 at 18:03
  • thank you for the NSNonLossyASCIIStringEncoding suggestion! Saved me a huge headache :) – user2734823 Aug 02 '14 at 16:37
4

clang follows the C standard here, which for some reason disallows this:

C99 6.4.3p2: A universal character name shall not specify a character whose short identifier is less than 00A0 other than 0024 ($), 0040 (@), or 0060 (`), nor one in the range D800 through DFFF inclusive.)

thakis
  • 5,405
  • 1
  • 33
  • 33