0

I'm receiving a string from the server in the following format:

118|...message...215|...message2...

Basically, it's the message length followed by a pipe and the message itself, repeated for multiple messages. The message is encoded UTF16.

I'm looking for a way to parse this in Swift. I know I could cast this as NSString and use standard indexes/ranges on that because UTF16 is what NSString uses, but I'm wondering what is the Swift way to handle this? I can't seem to find a way to pull a substring out of a String based on a UTF16 encoding.

Update

I'm not trying to initialize a String with raw UTF16 Data (there's plenty of ways to do that). I already have the string, so I'm trying to take a String in the above format and parse it. The issue I have is that the message length given to me by the server is based on UTF16. I can't simply extract the length and call String.advance(messageLength) on the Index because the length I've been given doesn't match the grapheme clusters that Swift advances on. My issue is that I can't extract from the string the message in Swift. I have to instead cast it over to NSString and then use "normal" NSRange on it. My question is how do I pull the substring out by extracting a range based on my search for the first pipe, and then use the length provided by the parser in UTF16.

This is all extremely simple to do with NSString. Not sure how it can be done in pure Swift (or if it can be done).

Aaron Hayman
  • 8,492
  • 2
  • 36
  • 63
  • Since the received information is really data bytes, convert that to a NSString with `init?(bytes bytes: UnsafePointer, length len: Int, encoding encoding: UInt)` which will be bridged to a Swift String. – zaph Dec 07 '15 at 19:55
  • Yes, I can bridge to NSString. I've already implemented that. But I'm want to see if it's possible to do it without relying on objective-c types... do it in Pure Swift. So far, doesn't seem like it's possible. I get that Swift is a somewhat new language, but it seems like something so basic as this parsing should be possible. – Aaron Hayman Dec 07 '15 at 20:00
  • Here is some code to create a Swift string from UTF-16 bytes in "pure Swift": http://stackoverflow.com/questions/24542170/is-there-a-way-to-create-a-string-from-utf16-array-in-swift. – Martin R Dec 07 '15 at 20:10
  • @MartinR I wasn't asking how to init a String with UTF-16 data. – Aaron Hayman Dec 07 '15 at 20:58
  • Is the input a Swift string or a C string or a byte sequence? Perhaps you can use the methods from http://stackoverflow.com/a/30404532/1187415 to convert a UTF-16 based index to a Swift String index? – Martin R Dec 07 '15 at 21:15
  • The input is a Swift string. The issue is that the length provided by the server is for UTF16 encoding. So I need to use it to extract the message, but I can't use Swift's normal indexes because it's advancing on grapheme clusters. – Aaron Hayman Dec 07 '15 at 21:31

2 Answers2

3

Here is my take on parsing the messages out of the string. I had to change your lengths to work with the string.

let message = "13|...message...14|...message2..."
let utf16 = message.utf16
var startingIndex = message.utf16.startIndex
var travellingIndex = message.utf16.startIndex
var messages = [String]()
var messageLength: Int

while travellingIndex != message.utf16.endIndex {

    // Start walking through each character
    if let char = String(utf16[travellingIndex..<travellingIndex.successor()]) {

        // When we find the pipe symbol try to parse out the message length
        if char == "|" {
            if let stringNumber = Int(String(utf16[startingIndex..<travellingIndex])) {
                messageLength = stringNumber

                // We found the lenght, now skip the pipe character
                startingIndex = travellingIndex.successor()

                // move the travelingIndex to the end of the message
                travellingIndex = travellingIndex.advancedBy(messageLength)

                // get the message and put it into an array
                if let message = String(utf16[startingIndex...travellingIndex]) {
                    messages.append(message)
                    startingIndex = travellingIndex.successor()
                }
            }
        }
    }

    travellingIndex = travellingIndex.successor()
}

print(messages)

The output I get at the end is:

["...message...", "...message2..."]
Mr Beardsley
  • 3,743
  • 22
  • 28
  • Hmm... ok I see, this is a pretty good approach. I simply need to remain within the `UTF16` view. I'll give this a shot. – Aaron Hayman Dec 07 '15 at 21:32
  • Nice! Although it does crash when the given length is longer than the string, which I tried to avoid. Shouldn't happen though, so +1 – Kametrixom Dec 07 '15 at 21:49
  • Granted there is no error handling in that code, but it gives the original questioner an idea of how to sub-string the UTF16 view of a Swift string. – Mr Beardsley Dec 07 '15 at 23:12
  • +1 - simply adding `travellingIndex = travellingIndex.advancedBy(messageLength, limit: endIndex)` where `let endIndex = utf16.endIndex.advance(-1)` worked to check against overflow. – Aaron Hayman Dec 07 '15 at 23:19
  • Also, this is more efficient: `view[currentIndex] == 124` instead of creating a new string on each iteration. Note: `124` is the unicode decimal scalar for `|`. – Aaron Hayman Dec 07 '15 at 23:21
0

The Foundation framework extends String to be initialisable from data:

import Foundation

let string = String(data: data, encoding: NSUTF16StringEncoding)

Getting rid of Foundation is not possible unless you implement the decoding yourself. Note that with Swift going open-source, Foundation is getting reimplemented without Objective-C dependency here.

EDIT: Thanks, Martin R, the link you provided is indeed working in pure Swift :D

EDIT:

There is the utf16 property of a String whose count property is the length in UTF16. Here is a simple parser for your purpose, efficiency isn't great, but it gets the job done:

func getMessages(var string: String) -> [String]? {

    func getMessage(string: String) -> (message: String, rest: String)? {
        guard let
            index = string.characters.indexOf("|"),
            length = Int(String(string.characters.prefixUpTo(index)))
        else { return nil }

        let msgRest = String(string.characters.suffixFrom(index.successor()))
        return (String(msgRest.utf16.prefix(length)), String(msgRest.utf16.dropFirst(length)))
    }

    var messages : [String] = []
    while let (message, rest) = getMessage(string) {
        string = rest
        messages.append(message)
    }
    return messages
}

func stringForMessages(messages: [String]) -> String {
    return messages.map{ "\($0.utf16.count)|\($0)" }.joinWithSeparator("")
}

let messages = [
    "123",
    "",
    "",
    "6⚽️"
]

let string = stringForMessages(messages)

let received = getMessages(string)

messages // ["123", "", "", "6⚽️"]

I actually tried making it more efficient, but Swift's String mechanics pushed against it.. I challenge anyone to create a beautiful efficient crash-safe parser for this..

Kametrixom
  • 14,673
  • 7
  • 45
  • 62
  • Please re-look at the question. I'm not asking how to init a String with UTF16 data. I already have the `String` and I need to parse it. – Aaron Hayman Dec 07 '15 at 20:50
  • Thanks for the answer. This does look like it would work. Unfortunately there is another answer that may be more efficient, since it only relies on tracking indexes and pulling out the string directly from a single UTF16view. I'm going to give both approaches a try and select the best one. – Aaron Hayman Dec 07 '15 at 21:38
  • @AaronHayman I was going for the safe approach (doesn't crash, even when the given length is longer than the string). Also consider doing this stuff yourself in the future, StackOverflow isn't there to request parsers from others – Kametrixom Dec 07 '15 at 21:51
  • I actually wasn't asking for a parser, although several people seem to want to provide it (definitely not complaining though). I'll be writing my own regardless of what people put here (I have Unit Tests to satisfy). I wasn't sure if there was a way to do what I needed. A simple answer stating: "You can manually iterate through the UTF16 view as an Array and extract the bytes using subscripting: `view[startIndex...endIndex]`" would have been a sufficient answer. – Aaron Hayman Dec 07 '15 at 22:00