What does stringWithUTF8String: do and how it works

Question

I am new to Objective-C, here I came across stringWithUTF8String class method. So, I search the method at apple developer library and I came to this sentence

Returns a string created by copying the data from a given C array of UTF8-encoded bytes.

After reading I do not have a single clue what the sentence is saying.

I find reading apple developer library difficult to understand. Can someone please provide some sample of simple codes of the method's application.

There is a huge functional difference between a C "array" (with which they *hopefully* mean "a common C string") and an UTF8 string. Are you familiar with UTF8 encoding? — Jongware, Dec 20 '13 at 09:29
@Jongware No I am really new to all these stuff and I'm really confuse with this method now. — user3090658, Dec 20 '13 at 09:37
If you're unfamiliar with Unicode then read http://www.joelonsoftware.com/articles/Unicode.html — Benedict Cohen, Dec 20 '13 at 09:42
@Jongware no there is not. C strings are precisely arrays of type char. A UTF8 string is designed to be used in C strings because it provides some degree of robustness against missing parts and it is possible to determine what unichars are encoded within the remainder and what parts are indecipherable. Because it is a variable width encoding and has sentinel values to indicate the width of an encoded unichar. — uchuugaka, Dec 20 '13 at 09:46
@uchuugaka: "C strings" are *zero-terminated* arrays of type char, per definition (see the recent [how to make a not null-terminated c string?](http://stackoverflow.com/questions/20683552/how-to-make-a-not-null-terminated-c-string/20684342) for heated discussions on this). Their definition is what make `strXXX` functions work. My point was UTF8 is *conceptually* different. — Jongware, Dec 20 '13 at 10:08
Zero or NULL. Let's be pedantic. C strings are C arrays. A C string literal will get you a NULL sentinel value indicating the end. Nothing else will other than being careful. It's C — uchuugaka, Dec 20 '13 at 10:19

score 4 · Accepted Answer · answered Dec 20 '13 at 09:39

4

It simply creates a Cocoa NSString from an utf8 encoded (http://en.wikipedia.org/wiki/UTF8) c string (a char*).

const char* cstr = "I am a c string";
NSString* str = [NSString stringWithUTF8String:cstr];

answered Dec 20 '13 at 09:39

Kaiserludi

2,434
2
23
41

so basically the method is just convertin C string into Objective-C string? – user3090658 Dec 20 '13 at 09:41
1

Yes. The utf8 in the method name just means, that while doing this conversation if the method encounters a special character that's not part of ascii (for example russian, chinese, japanese or korean chracters or european languages special chracters like the german ä, ö and ü, then it will assume, that is in encoded in utf8 in the c string). For the ascii characters (first table here http://www.asciitable.com/) it will even work, if the c string is not encoded in utf8, but with the current locale. – Kaiserludi Dec 20 '13 at 09:49
do you think i need to know basics of unicode encoding and utf-8 stuff? – user3090658 Dec 20 '13 at 09:52
1

In this context just as far that you are aware taht when your c string maybe contains special charaacters, they may not be converted correctly if you don't know the encoding of the c string. Aside from that its the task of that method to know the specifics, not your task. – Kaiserludi Dec 20 '13 at 09:55

score 2 · Answer 2 · answered Dec 20 '13 at 09:29

2

NSString *s = [NSString stringWithUTF8String:"Long dash symbol \xe2\x80\x94"];
NSLog(@"%@",s);

UTF8 encoded strings are treated as C array of bytes, so to convert them into string you can use this method.

answered Dec 20 '13 at 09:29

βhargavḯ

9,786
1
37
59

Does it mean with this method I'm converting C strings into Objective-C strings in object? – user3090658 Dec 20 '13 at 09:37
yup. you can also look at `stringWithCString` which accepts encoding also. – βhargavḯ Dec 20 '13 at 09:45
Yes. But only if the C string is a valid UTF8 encoded string. – uchuugaka Dec 20 '13 at 09:48

score 2 · Answer 3 · answered Dec 20 '13 at 10:10

So it sounds like you're encountering the richness and complexity that NSString supports. It's a deep subject.

All strings are stored as some sort of array in reality and have encoding that maps the array elements values to characters.

NSString hides this complexity well most of the time because it is hard to learn and do well manually and few people really can. It's also way more interesting to just not have to think about it most of the time and do other stuff.

However, NSString internally keeps an array of Unichars. When you create a string from another string from an external source that is not already an NSString, you need to know the encoding. Otherwise garbage in garbage out.

UTF8 happens to be really robust and not sensitive to things like byte order, so it's gained wide adoption on the web and in XML. It's not ideal for all situations so platform native frameworks tend to use UTF16 or 32 for certain optimizations.

You might say it's the encoding of files. (Horrible generalization) and that UTF16 and 32 with their byte order concerns are the encoding of hardware specific processing power. (Another bad generalization)

Wikipedia has a great entry on Unicode encodings and on UTF8 in particular.

It's a good place to start your adventure.

score 1 · Answer 4 · answered Dec 20 '13 at 09:31

1

The documentation is clear. You can create an Objective C object from a C type, ie you can transform a C array to an NSString ( Objective C object). you can it for example With and NSData object like this :

NSDAta *data = ....

NSString *myString = [NSString stringWithUTF8String:[data bytes]];

answered Dec 20 '13 at 09:31

samir

4,501
6
49
76

Sorry but I'm learning Objective-C now. Can you please give some simpler explanation to this? I know the documentation is clear but its hard to understand with the various programming jargon written in it. – user3090658 Dec 20 '13 at 09:33
Sorry it's part of the territory. Objective-C is a superset of C. It also includes a huge set of things. You will feel lost at times. There are lots of deep things there with a history decades long. – uchuugaka Dec 20 '13 at 13:19

What does stringWithUTF8String: do and how it works

4 Answers4