1

I have a binary file test.data containing the following data:

01 E6 B5 8B E8 AF 95 02

The first byte is just for example a sequence number 01. The next 6 bytes are two UTF8 Chinese characters "测试".Then the 8th byte is again another sequence number 02.

As I know the UTF8 is variable length (1-4 bytes). Please refer to this post.

I'm using the following code to read Int32 and Byte(UInt8):

extension NSInputStream
{
    func readInt32() -> Int
    {
        var readBuffer = Array<UInt8>(count:sizeof(Int32), repeatedValue: 0)

        var numberOfBytesRead = self.read(&readBuffer, maxLength: readBuffer.count)

        return Int(readBuffer[0]) << 24 |
            Int(readBuffer[1]) << 16 |
            Int(readBuffer[2]) << 8 |
            Int(readBuffer[3])
    }

    func readByte() -> Byte {

        var readBuffer : Byte = 0
        return self.read(&readBuffer, maxLength: sizeof(UInt8))
    }

I'd like to write a method to read Strings from the stream. Here is what I'm thinking:

  • Read bytes (assume I know how many bytes to read)
  • Conver the bytes to Character
  • Append the Character into String

But the problem is how many bytes to read for a Character because the UTF8 length is variant? In general my question is how I'm supposed to read UTF8 String? Thanks in advance.

Community
  • 1
  • 1
Bagusflyer
  • 12,675
  • 21
  • 96
  • 179

2 Answers2

4

Just read to UnsafeMutablePointer buffer and convert it to a String. The returned String will be UTF8.

extension NSInputStream
{
    public func readString(length:Int) -> String {

        var str = ""

        if length > 0 {
            var readBuffer = UnsafeMutablePointer<UInt8>.alloc(length+1)

            var numberOfBytesRead = self.read(readBuffer, maxLength: length)
            if numberOfBytesRead == length {

                var buf = UnsafeMutablePointer<CChar>(readBuffer)
                buf[length] = 0
                // the C String must be null terminated
                if let utf8String = String.fromCString(buf) {
                    str = utf8String
                }
            }
            readBuffer.dealloc(length)
        }
        return str

    }
}
Bagusflyer
  • 12,675
  • 21
  • 96
  • 179
  • Can you explain the "if numberOfBytesRead == length" ? I would use "> 0" instead of "== length", otherwise the string gets cut at the end of the Stream? I'm quite new to Swift and NSInputStream so excuse my lack of knowledge ;) – Christoph Sonntag Mar 01 '15 at 11:08
  • 1
    Just tried it - at least in the way I used your extension it happened, my string got truncated at the end. I replaced "== length" with ">0" and "buf[length]" with "buf[numberOfBytesRead]" and it works. oh, and I changed readBuffer.dealloc(length) with readBuffer.dealloc(length+1). Don't know if it makes a difference but I think its more exact to dealloc as much as was alloced before. Any thoughts / can you edit your post or should I post the fixed code? – Christoph Sonntag Mar 01 '15 at 11:56
4

Here is the fixed version I mentioned in my comment to bagusflyer's post :

extension NSInputStream
{
  public func readString(length:Int) -> String {

    var str = ""

    if length > 0 {
        var readBuffer = UnsafeMutablePointer<UInt8>.alloc(length+1)

        var numberOfBytesRead = self.read(readBuffer, maxLength: length)
        // modified this from == length to > 0
        if numberOfBytesRead > 0 {

            var buf = UnsafeMutablePointer<CChar>(readBuffer)
            buf[numberOfBytesRead] = 0
            // the C String must be null terminated
            if let utf8String = String.fromCString(buf) {
                str = utf8String
            }
        }
        readBuffer.dealloc(length+1)
    }
    return str

  }
}
Christoph Sonntag
  • 4,459
  • 1
  • 24
  • 49
  • I had an NSData and tried to use: NSString(data: decData, encoding:NSUTF8StringEncoding ) but didn't work. Using such function (adapted for NSData input) all worked fine – bsorrentino Apr 25 '15 at 20:27