We know we can print each character in UTF8 code units? Then, if we have code units of these characters, how can we create a String with them?
10 Answers
With Swift 5, you can choose one of the following ways in order to convert a collection of UTF-8 code units into a string.
#1. Using String
's init(_:)
initializer
If you have a String.UTF8View
instance (i.e. a collection of UTF-8 code units) and want to convert it to a string, you can use init(_:)
initializer. init(_:)
has the following declaration:
init(_ utf8: String.UTF8View)
Creates a string corresponding to the given sequence of UTF-8 code units.
The Playground sample code below shows how to use init(_:)
:
let string = "Café "
let utf8View: String.UTF8View = string.utf8
let newString = String(utf8View)
print(newString) // prints: Café
#2. Using Swift
's init(decoding:as:)
initializer
init(decoding:as:)
creates a string from the given Unicode code units collection in the specified encoding:
let string = "Café "
let codeUnits: [Unicode.UTF8.CodeUnit] = Array(string.utf8)
let newString = String(decoding: codeUnits, as: UTF8.self)
print(newString) // prints: Café
Note that init(decoding:as:)
also works with String.UTF8View
parameter:
let string = "Café "
let utf8View: String.UTF8View = string.utf8
let newString = String(decoding: utf8View, as: UTF8.self)
print(newString) // prints: Café
#3. Using transcode(_:from:to:stoppingOnError:into:)
function
The following example transcodes the UTF-8 representation of an initial string into Unicode scalar values (UTF-32 code units) that can be used to build a new string:
let string = "Café "
let bytes = Array(string.utf8)
var newString = ""
_ = transcode(bytes.makeIterator(), from: UTF8.self, to: UTF32.self, stoppingOnError: true, into: {
newString.append(String(Unicode.Scalar($0)!))
})
print(newString) // prints: Café
#4. Using Array
's withUnsafeBufferPointer(_:)
method and String
's init(cString:)
initializer
init(cString:)
has the following declaration:
init(cString: UnsafePointer<CChar>)
Creates a new string by copying the null-terminated UTF-8 data referenced by the given pointer.
The following example shows how to use init(cString:)
with a pointer to the content of a CChar
array (i.e. a well-formed UTF-8 code unit sequence) in order to create a string from it:
let bytes: [CChar] = [67, 97, 102, -61, -87, 32, -16, -97, -121, -85, -16, -97, -121, -73, 0]
let newString = bytes.withUnsafeBufferPointer({ (bufferPointer: UnsafeBufferPointer<CChar>)in
return String(cString: bufferPointer.baseAddress!)
})
print(newString) // prints: Café
#5. Using Unicode.UTF8
's decode(_:)
method
To decode a code unit sequence, call decode(_:)
repeatedly until it returns UnicodeDecodingResult.emptyInput
:
let string = "Café "
let codeUnits = Array(string.utf8)
var codeUnitIterator = codeUnits.makeIterator()
var utf8Decoder = Unicode.UTF8()
var newString = ""
Decode: while true {
switch utf8Decoder.decode(&codeUnitIterator) {
case .scalarValue(let value):
newString.append(Character(Unicode.Scalar(value)))
case .emptyInput:
break Decode
case .error:
print("Decoding error")
break Decode
}
}
print(newString) // prints: Café
#6. Using String
's init(bytes:encoding:)
initializer
Foundation gives String
a init(bytes:encoding:)
initializer that you can use as indicated in the Playground sample code below:
import Foundation
let string = "Café "
let bytes: [Unicode.UTF8.CodeUnit] = Array(string.utf8)
let newString = String(bytes: bytes, encoding: String.Encoding.utf8)
print(String(describing: newString)) // prints: Optional("Café ")

- 89,880
- 29
- 256
- 218
-
1#2 above is the general simple safe and efficient answer. – Dave Abrahams Aug 27 '23 at 20:53
It's possible to convert UTF8 code points to a Swift String idiomatically using the UTF8
Swift class. Although it's much easier to convert from String to UTF8!
import Foundation
public class UTF8Encoding {
public static func encode(bytes: Array<UInt8>) -> String {
var encodedString = ""
var decoder = UTF8()
var generator = bytes.generate()
var finished: Bool = false
do {
let decodingResult = decoder.decode(&generator)
switch decodingResult {
case .Result(let char):
encodedString.append(char)
case .EmptyInput:
finished = true
/* ignore errors and unexpected values */
case .Error:
finished = true
default:
finished = true
}
} while (!finished)
return encodedString
}
public static func decode(str: String) -> Array<UInt8> {
var decodedBytes = Array<UInt8>()
for b in str.utf8 {
decodedBytes.append(b)
}
return decodedBytes
}
}
func testUTF8Encoding() {
let testString = "A UTF8 String With Special Characters: "
let decodedArray = UTF8Encoding.decode(testString)
let encodedString = UTF8Encoding.encode(decodedArray)
XCTAssert(encodedString == testString, "UTF8Encoding is lossless: \(encodedString) != \(testString)")
}
Of the other alternatives suggested:
Using
NSString
invokes the Objective-C bridge;Using
UnicodeScalar
is error-prone because it converts UnicodeScalars directly to Characters, ignoring complex grapheme clusters; andUsing
String.fromCString
is potentially unsafe as it uses pointers.

- 201
- 2
- 2
-
2Thank you for decoding UTF8 encoding! You can remove `import Foundation` from the top, that's the whole reason I want to use this.. – ephemer Sep 28 '15 at 23:25
-
Thanks! Very helpful. Here is a link to the Sandbox with this working with a couple updates and made decode a bit easier. http://swiftlang.ng.bluemix.net/#/repl/2dde62756a95d6d1c7bb88068cb35ebfe4b13ffc3ec891856992166caa8a291d – Pat Apr 13 '16 at 22:45
-
Your use of the words "encode" and "decode" are the opposite of how I think about the conversions between strings and UTF-8 data. – RenniePet Jan 10 '17 at 13:20
improve on Martin R's answer
import AppKit
let utf8 : CChar[] = [65, 66, 67, 0]
let str = NSString(bytes: utf8, length: utf8.count, encoding: NSUTF8StringEncoding)
println(str) // Output: ABC
import AppKit
let utf8 : UInt8[] = [0xE2, 0x82, 0xAC, 0]
let str = NSString(bytes: utf8, length: utf8.count, encoding: NSUTF8StringEncoding)
println(str) // Output: €
What happened is Array
can be automatic convert to CConstVoidPointer
which can be used to create string with NSSString(bytes: CConstVoidPointer, length len: Int, encoding: Uint)

- 45,816
- 18
- 112
- 143
-
4Note that your code converts the 0 byte as well, to a NUL-character in the created NSString. – Martin R Jun 28 '14 at 13:23
I've been looking for a comprehensive answer regarding string manipulation in Swift myself. Relying on cast to and from NSString
and other unsafe pointer magic just wasn't doing it for me. Here's a safe alternative:
First, we'll want to extend UInt8
. This is the primitive type behind CodeUnit
.
extension UInt8 {
var character: Character {
return Character(UnicodeScalar(self))
}
}
This will allow us to do something like this:
let codeUnits: [UInt8] = [
72, 69, 76, 76, 79
]
let characters = codeUnits.map { $0.character }
let string = String(characters)
// string prints "HELLO"
Equipped with this extension, we can now being modifying strings.
let string = "ABCDEFGHIJKLMONP"
var modifiedCharacters = [Character]()
for (index, utf8unit) in string.utf8.enumerate() {
// Insert a "-" every 4 characters
if index > 0 && index % 4 == 0 {
let separator: UInt8 = 45 // "-" in ASCII
modifiedCharacters.append(separator.character)
}
modifiedCharacters.append(utf8unit.character)
}
let modifiedString = String(modifiedCharacters)
// modified string == "ABCD-EFGH-IJKL-MONP"

- 5,468
- 2
- 23
- 19
-
Am I correct in assuming that this will only work with ASCII character strings? I.e., it will mess things up if there are Danish letters Æ Ø Å æ ø å in the string? Or accented letters? Not to mention other alphabets like Russian cyrillic and the Greek alphabet and Chinese and ... – RenniePet Dec 14 '16 at 06:55
-
Yes, that assumption is correct. This solution will only work for single byte (ASCII) characters only and will quickly break on anything like emoji or international characters. – dbart Dec 19 '16 at 19:51
// Swift4
var units = [UTF8.CodeUnit]()
//
// update units
//
let str = String(decoding: units, as: UTF8.self)

- 351
- 3
- 10
-
While this code snippet may be the solution, [including an explanation](https://meta.stackexchange.com/questions/114762/explaining-entirely-%E2%80%8C%E2%80%8Bcode-based-answers) really helps to improve the quality of your post. Remember that you are answering the question for readers in the future, and those people might not know the reasons for your code suggestion. – Narendra Jadhav Jun 17 '18 at 07:40
This is a possible solution (now updated for Swift 2):
let utf8 : [CChar] = [65, 66, 67, 0]
if let str = utf8.withUnsafeBufferPointer( { String.fromCString($0.baseAddress) }) {
print(str) // Output: ABC
} else {
print("Not a valid UTF-8 string")
}
Within the closure, $0
is a UnsafeBufferPointer<CChar>
pointing to the array's contiguous storage. From that a Swift String
can be created.
Alternatively, if you prefer the input as unsigned bytes:
let utf8 : [UInt8] = [0xE2, 0x82, 0xAC, 0]
if let str = utf8.withUnsafeBufferPointer( { String.fromCString(UnsafePointer($0.baseAddress)) }) {
print(str) // Output: €
} else {
print("Not a valid UTF-8 string")
}

- 529,903
- 94
- 1,240
- 1,382
-
C code written in Swift syntax... and more ugly than C (which may be a good thing so people want to avoid them) – Bryan Chen Jun 28 '14 at 12:51
-
1@BryanChen: I have just tried to present a Swift-only solution that does not use Foundation and Objective-C classes... – Martin R Jun 28 '14 at 12:54
-
I think the true Swift way must use `Character` and `UTF8` somewhere – Bryan Chen Jun 28 '14 at 12:56
I would do something like this, it may be not such elegant than working with 'pointers' but it does the job well, those are pretty much about a bunch of new +=
operators for String
like:
@infix func += (inout lhs: String, rhs: (unit1: UInt8)) {
lhs += Character(UnicodeScalar(UInt32(rhs.unit1)))
}
@infix func += (inout lhs: String, rhs: (unit1: UInt8, unit2: UInt8)) {
lhs += Character(UnicodeScalar(UInt32(rhs.unit1) << 8 | UInt32(rhs.unit2)))
}
@infix func += (inout lhs: String, rhs: (unit1: UInt8, unit2: UInt8, unit3: UInt8, unit4: UInt8)) {
lhs += Character(UnicodeScalar(UInt32(rhs.unit1) << 24 | UInt32(rhs.unit2) << 16 | UInt32(rhs.unit3) << 8 | UInt32(rhs.unit4)))
}
NOTE: you can extend the list of the supported operators with overriding +
operator as well, defining a list of the fully commutative operators for String
.
and now you are able to append a String
with a unicode (UTF-8, UTF-16 or UTF-32) character like e.g.:
var string: String = "signs of the Zodiac: "
string += (0x0, 0x0, 0x26, 0x4b)
string += (38)
string += (0x26, 76)

- 23,961
- 7
- 62
- 76
-
Just a remark: Your code creates a String from UTF-32 input (if I understand it correctly) and mine from UTF-8 input. Reading the question again I am not 100% sure what is requested here. OP mentions both "UTF-8" and "Code point" ... – Martin R Jun 28 '14 at 11:19
-
@MartinR, you are right, to be fair, I'm not sure about the real question either, the reason is just the same as you just said... – holex Jun 28 '14 at 11:40
-
Note that the UTF-8 sequence for a Unicode code point has 1, 2, 3, or 4 bytes. – Martin R Jun 28 '14 at 12:39
If you're starting with a raw buffer, such as from the Data object returned from a file handle (in this case, taken from a Pipe object):
let data = pipe.fileHandleForReading.readDataToEndOfFile()
var unsafePointer = UnsafeMutablePointer<UInt8>.allocate(capacity: data.count)
data.copyBytes(to: unsafePointer, count: data.count)
let output = String(cString: unsafePointer)

- 21
- 1
There is Swift 3.0 version of Martin R answer
public class UTF8Encoding {
public static func encode(bytes: Array<UInt8>) -> String {
var encodedString = ""
var decoder = UTF8()
var generator = bytes.makeIterator()
var finished: Bool = false
repeat {
let decodingResult = decoder.decode(&generator)
switch decodingResult {
case .scalarValue(let char):
encodedString += "\(char)"
case .emptyInput:
finished = true
case .error:
finished = true
}
} while (!finished)
return encodedString
}
public static func decode(str: String) -> Array<UInt8> {
var decodedBytes = Array<UInt8>()
for b in str.utf8 {
decodedBytes.append(b)
}
return decodedBytes
}
}
If you want show emoji from UTF-8 string, just user convertEmojiCodesToString method below. It is working properly for strings like "U+1F52B" (emoji) or "U+1F1E6 U+1F1F1" (country flag emoji)
class EmojiConverter {
static func convertEmojiCodesToString(_ emojiCodesString: String) -> String {
let emojies = emojiCodesString.components(separatedBy: " ")
var resultString = ""
for emoji in emojies {
var formattedCode = emoji
formattedCode.slice(from: 2, to: emoji.length)
formattedCode = formattedCode.lowercased()
if let charCode = UInt32(formattedCode, radix: 16),
let unicode = UnicodeScalar(charCode) {
let str = String(unicode)
resultString += "\(str)"
}
}
return resultString
}
}

- 1
- 1

- 469
- 5
- 7