How can I create a String from UTF8 in Swift?

Question

We know we can print each character in UTF8 code units? Then, if we have code units of these characters, how can we create a String with them?

Imanou Petit · Answer 1 · 2019-02-27T21:05:05.803

With Swift 5, you can choose one of the following ways in order to convert a collection of UTF-8 code units into a string.

#1. Using `String`'s `init(_:)` initializer

If you have a String.UTF8View instance (i.e. a collection of UTF-8 code units) and want to convert it to a string, you can use init(_:) initializer. init(_:) has the following declaration:

init(_ utf8: String.UTF8View)

Creates a string corresponding to the given sequence of UTF-8 code units.

The Playground sample code below shows how to use init(_:):

let string = "Café "
let utf8View: String.UTF8View = string.utf8

let newString = String(utf8View)
print(newString) // prints: Café

#2. Using `Swift`'s `init(decoding:as:)` initializer

init(decoding:as:) creates a string from the given Unicode code units collection in the specified encoding:

let string = "Café "
let codeUnits: [Unicode.UTF8.CodeUnit] = Array(string.utf8)

let newString = String(decoding: codeUnits, as: UTF8.self)
print(newString) // prints: Café

Note that init(decoding:as:) also works with String.UTF8View parameter:

let string = "Café "
let utf8View: String.UTF8View = string.utf8

let newString = String(decoding: utf8View, as: UTF8.self)
print(newString) // prints: Café

#3. Using `transcode(_:from:to:stoppingOnError:into:)` function

The following example transcodes the UTF-8 representation of an initial string into Unicode scalar values (UTF-32 code units) that can be used to build a new string:

let string = "Café "
let bytes = Array(string.utf8)

var newString = ""
_ = transcode(bytes.makeIterator(), from: UTF8.self, to: UTF32.self, stoppingOnError: true, into: {
    newString.append(String(Unicode.Scalar($0)!))
})
print(newString) // prints: Café

#4. Using `Array`'s `withUnsafeBufferPointer(_:)` method and `String`'s `init(cString:)` initializer

init(cString:) has the following declaration:

init(cString: UnsafePointer<CChar>)

Creates a new string by copying the null-terminated UTF-8 data referenced by the given pointer.

The following example shows how to use init(cString:) with a pointer to the content of a CChar array (i.e. a well-formed UTF-8 code unit sequence) in order to create a string from it:

let bytes: [CChar] = [67, 97, 102, -61, -87, 32, -16, -97, -121, -85, -16, -97, -121, -73, 0]

let newString = bytes.withUnsafeBufferPointer({ (bufferPointer: UnsafeBufferPointer<CChar>)in
    return String(cString: bufferPointer.baseAddress!)
})
print(newString) // prints: Café

#5. Using `Unicode.UTF8`'s `decode(_:)` method

To decode a code unit sequence, call decode(_:) repeatedly until it returns UnicodeDecodingResult.emptyInput:

let string = "Café "
let codeUnits = Array(string.utf8)

var codeUnitIterator = codeUnits.makeIterator()
var utf8Decoder = Unicode.UTF8()
var newString = ""

Decode: while true {
    switch utf8Decoder.decode(&codeUnitIterator) {
    case .scalarValue(let value):
        newString.append(Character(Unicode.Scalar(value)))
    case .emptyInput:
        break Decode
    case .error:
        print("Decoding error")
        break Decode
    }
}

print(newString) // prints: Café

#6. Using `String`'s `init(bytes:encoding:)` initializer

Foundation gives String a init(bytes:encoding:) initializer that you can use as indicated in the Playground sample code below:

import Foundation

let string = "Café "
let bytes: [Unicode.UTF8.CodeUnit] = Array(string.utf8)

let newString = String(bytes: bytes, encoding: String.Encoding.utf8)
print(String(describing: newString)) // prints: Optional("Café ")

#2 above is the general simple safe and efficient answer. – Dave Abrahams Aug 27 '23 at 20:53 — Dave Abrahams, Aug 27 '23 at 20:53

score 15 · Answer 2 · answered Jun 03 '15 at 19:40

It's possible to convert UTF8 code points to a Swift String idiomatically using the UTF8 Swift class. Although it's much easier to convert from String to UTF8!

import Foundation

public class UTF8Encoding {
  public static func encode(bytes: Array<UInt8>) -> String {
    var encodedString = ""
    var decoder = UTF8()
    var generator = bytes.generate()
    var finished: Bool = false
    do {
      let decodingResult = decoder.decode(&generator)
      switch decodingResult {
      case .Result(let char):
        encodedString.append(char)
      case .EmptyInput:
        finished = true
      /* ignore errors and unexpected values */
      case .Error:
        finished = true
      default:
        finished = true
      }
    } while (!finished)
    return encodedString
  }

  public static func decode(str: String) -> Array<UInt8> {
    var decodedBytes = Array<UInt8>()
    for b in str.utf8 {
      decodedBytes.append(b)
    }
    return decodedBytes
  }
}

func testUTF8Encoding() {
  let testString = "A UTF8 String With Special Characters: "
  let decodedArray = UTF8Encoding.decode(testString)
  let encodedString = UTF8Encoding.encode(decodedArray)
  XCTAssert(encodedString == testString, "UTF8Encoding is lossless: \(encodedString) != \(testString)")
}

Of the other alternatives suggested:

Using NSString invokes the Objective-C bridge;
Using UnicodeScalar is error-prone because it converts UnicodeScalars directly to Characters, ignoring complex grapheme clusters; and
Using String.fromCString is potentially unsafe as it uses pointers.

Thank you for decoding UTF8 encoding! You can remove `import Foundation` from the top, that's the whole reason I want to use this.. — ephemer, Sep 28 '15 at 23:25
Thanks! Very helpful. Here is a link to the Sandbox with this working with a couple updates and made decode a bit easier. http://swiftlang.ng.bluemix.net/#/repl/2dde62756a95d6d1c7bb88068cb35ebfe4b13ffc3ec891856992166caa8a291d — Pat, Apr 13 '16 at 22:45
Your use of the words "encode" and "decode" are the opposite of how I think about the conversions between strings and UTF-8 data. — RenniePet, Jan 10 '17 at 13:20

score 5 · Answer 3 · answered Jun 28 '14 at 12:43

improve on Martin R's answer

import AppKit

let utf8 : CChar[] = [65, 66, 67, 0]
let str = NSString(bytes: utf8, length: utf8.count, encoding: NSUTF8StringEncoding)
println(str) // Output: ABC

import AppKit

let utf8 : UInt8[] = [0xE2, 0x82, 0xAC, 0]
let str = NSString(bytes: utf8, length: utf8.count, encoding: NSUTF8StringEncoding)
println(str) // Output: €

What happened is Array can be automatic convert to CConstVoidPointer which can be used to create string with NSSString(bytes: CConstVoidPointer, length len: Int, encoding: Uint)

Note that your code converts the 0 byte as well, to a NUL-character in the created NSString. — Martin R, Jun 28 '14 at 13:23

Alex Shubin · Answer 4 · 2017-05-22T08:18:38.597

4

Swift 3

let s = String(bytes: arr, encoding: .utf8)

edited May 22 '17 at 08:18

answered Mar 20 '17 at 06:38

Alex Shubin

3,549
1
27
32

score 2 · Answer 5 · answered Sep 07 '16 at 13:07

I've been looking for a comprehensive answer regarding string manipulation in Swift myself. Relying on cast to and from NSString and other unsafe pointer magic just wasn't doing it for me. Here's a safe alternative:

First, we'll want to extend UInt8. This is the primitive type behind CodeUnit.

extension UInt8 {
    var character: Character {
        return Character(UnicodeScalar(self))
    }
}

This will allow us to do something like this:

let codeUnits: [UInt8] = [
    72, 69, 76, 76, 79
]

let characters = codeUnits.map { $0.character }
let string     = String(characters)

// string prints "HELLO"

Equipped with this extension, we can now being modifying strings.

let string = "ABCDEFGHIJKLMONP"

var modifiedCharacters = [Character]()
for (index, utf8unit) in string.utf8.enumerate() {

    // Insert a "-" every 4 characters
    if index > 0 && index % 4 == 0 {
        let separator: UInt8 = 45 // "-" in ASCII
        modifiedCharacters.append(separator.character)
    }
    modifiedCharacters.append(utf8unit.character)
}

let modifiedString = String(modifiedCharacters)

// modified string == "ABCD-EFGH-IJKL-MONP"

Am I correct in assuming that this will only work with ASCII character strings? I.e., it will mess things up if there are Danish letters Æ Ø Å æ ø å in the string? Or accented letters? Not to mention other alphabets like Russian cyrillic and the Greek alphabet and Chinese and ... — RenniePet, Dec 14 '16 at 06:55
Yes, that assumption is correct. This solution will only work for single byte (ASCII) characters only and will quickly break on anything like emoji or international characters. — dbart, Dec 19 '16 at 19:51

score 2 · Answer 6 · answered Jun 17 '18 at 07:12

2

// Swift4
var units = [UTF8.CodeUnit]()
//
// update units
//
let str = String(decoding: units, as: UTF8.self)

answered Jun 17 '18 at 07:12

Qinghua

351
3
10

While this code snippet may be the solution, [including an explanation](https://meta.stackexchange.com/questions/114762/explaining-entirely-%E2%80%8C%E2%80%8Bcode-based-answers) really helps to improve the quality of your post. Remember that you are answering the question for readers in the future, and those people might not know the reasons for your code suggestion. – Narendra Jadhav Jun 17 '18 at 07:40

Martin R · Answer 7 · 2016-03-31T07:46:15.707

1

This is a possible solution (now updated for Swift 2):

let utf8 : [CChar] = [65, 66, 67, 0]
if let str = utf8.withUnsafeBufferPointer( { String.fromCString($0.baseAddress) }) {
    print(str) // Output: ABC
} else {
    print("Not a valid UTF-8 string") 
}

Within the closure, $0 is a UnsafeBufferPointer<CChar> pointing to the array's contiguous storage. From that a Swift String can be created.

Alternatively, if you prefer the input as unsigned bytes:

let utf8 : [UInt8] = [0xE2, 0x82, 0xAC, 0]
if let str = utf8.withUnsafeBufferPointer( { String.fromCString(UnsafePointer($0.baseAddress)) }) {
    print(str) // Output: €
} else {
    print("Not a valid UTF-8 string")
}

edited Mar 31 '16 at 07:46

answered Jun 28 '14 at 10:13

Martin R

529,903
94
1,240
1,382

C code written in Swift syntax... and more ugly than C (which may be a good thing so people want to avoid them) – Bryan Chen Jun 28 '14 at 12:51
1

@BryanChen: I have just tried to present a Swift-only solution that does not use Foundation and Objective-C classes... – Martin R Jun 28 '14 at 12:54
I think the true Swift way must use `Character` and `UTF8` somewhere – Bryan Chen Jun 28 '14 at 12:56

holex · Answer 8 · 2014-06-28T12:13:34.870

I would do something like this, it may be not such elegant than working with 'pointers' but it does the job well, those are pretty much about a bunch of new += operators for String like:

@infix func += (inout lhs: String, rhs: (unit1: UInt8)) {
    lhs += Character(UnicodeScalar(UInt32(rhs.unit1)))
}

@infix func += (inout lhs: String, rhs: (unit1: UInt8, unit2: UInt8)) {
    lhs += Character(UnicodeScalar(UInt32(rhs.unit1) << 8 | UInt32(rhs.unit2)))
}

@infix func += (inout lhs: String, rhs: (unit1: UInt8, unit2: UInt8, unit3: UInt8, unit4: UInt8)) {
    lhs += Character(UnicodeScalar(UInt32(rhs.unit1) << 24 | UInt32(rhs.unit2) << 16 | UInt32(rhs.unit3) << 8 | UInt32(rhs.unit4)))
}

NOTE: you can extend the list of the supported operators with overriding + operator as well, defining a list of the fully commutative operators for String.

and now you are able to append a String with a unicode (UTF-8, UTF-16 or UTF-32) character like e.g.:

var string: String = "signs of the Zodiac: "
string += (0x0, 0x0, 0x26, 0x4b)
string += (38)
string += (0x26, 76)

Just a remark: Your code creates a String from UTF-32 input (if I understand it correctly) and mine from UTF-8 input. Reading the question again I am not 100% sure what is requested here. OP mentions both "UTF-8" and "Code point" ... — Martin R, Jun 28 '14 at 11:19
@MartinR, you are right, to be fair, I'm not sure about the real question either, the reason is just the same as you just said... — holex, Jun 28 '14 at 11:40
Note that the UTF-8 sequence for a Unicode code point has 1, 2, 3, or 4 bytes. — Martin R, Jun 28 '14 at 12:39

score 1 · Answer 9 · answered Aug 18 '17 at 18:34

If you're starting with a raw buffer, such as from the Data object returned from a file handle (in this case, taken from a Pipe object):

let data = pipe.fileHandleForReading.readDataToEndOfFile()
var unsafePointer = UnsafeMutablePointer<UInt8>.allocate(capacity: data.count)

data.copyBytes(to: unsafePointer, count: data.count)

let output = String(cString: unsafePointer)

score 0 · Answer 10 · edited May 23 '17 at 12:34

There is Swift 3.0 version of Martin R answer

public class UTF8Encoding {
  public static func encode(bytes: Array<UInt8>) -> String {
    var encodedString = ""
    var decoder = UTF8()
    var generator = bytes.makeIterator()
    var finished: Bool = false
    repeat {
      let decodingResult = decoder.decode(&generator)
      switch decodingResult {
      case .scalarValue(let char):
        encodedString += "\(char)"
      case .emptyInput:
        finished = true
      case .error:
        finished = true
      }
    } while (!finished)
    return encodedString
  }
  public static func decode(str: String) -> Array<UInt8> {
    var decodedBytes = Array<UInt8>()
    for b in str.utf8 {
      decodedBytes.append(b)
    }
    return decodedBytes
  }
}

If you want show emoji from UTF-8 string, just user convertEmojiCodesToString method below. It is working properly for strings like "U+1F52B" (emoji) or "U+1F1E6 U+1F1F1" (country flag emoji)

class EmojiConverter {
  static func convertEmojiCodesToString(_ emojiCodesString: String) -> String {
    let emojies = emojiCodesString.components(separatedBy: " ")
    var resultString = ""
    for emoji in emojies {
      var formattedCode = emoji
      formattedCode.slice(from: 2, to: emoji.length)
      formattedCode = formattedCode.lowercased()
      if let charCode = UInt32(formattedCode, radix: 16),
        let unicode = UnicodeScalar(charCode) {
        let str = String(unicode)
        resultString += "\(str)"
      }
    }
    return resultString
  }
}

How can I create a String from UTF8 in Swift?

10 Answers10

#1. Using `String`'s `init(_:)` initializer

#2. Using `Swift`'s `init(decoding:as:)` initializer

#3. Using `transcode(_:from:to:stoppingOnError:into:)` function

#4. Using `Array`'s `withUnsafeBufferPointer(_:)` method and `String`'s `init(cString:)` initializer

#5. Using `Unicode.UTF8`'s `decode(_:)` method

#6. Using `String`'s `init(bytes:encoding:)` initializer

Linked

How can I create a String from UTF8 in Swift?

10 Answers10

#1. Using String's init(_:) initializer

#2. Using Swift's init(decoding:as:) initializer

#3. Using transcode(_:from:to:stoppingOnError:into:) function

#4. Using Array's withUnsafeBufferPointer(_:) method and String's init(cString:) initializer

#5. Using Unicode.UTF8's decode(_:) method

#6. Using String's init(bytes:encoding:) initializer

Linked

#1. Using `String`'s `init(_:)` initializer

#2. Using `Swift`'s `init(decoding:as:)` initializer

#3. Using `transcode(_:from:to:stoppingOnError:into:)` function

#4. Using `Array`'s `withUnsafeBufferPointer(_:)` method and `String`'s `init(cString:)` initializer

#5. Using `Unicode.UTF8`'s `decode(_:)` method

#6. Using `String`'s `init(bytes:encoding:)` initializer