Is it possible to write a Swift function that replaces only part of an extended grapheme cluster like ‍‍‍?

Question

I want to write a function that could be used like this:

let ‍‍‍ = "‍‍‍".replacingFirstOccurrence(of: "", with: "")

Given how odd both this string and Swift's String library are, is this possible in Swift?

This would indeed be a bit tricky if it needs to work with all emoji, including sequences, while also retaining the invisible characters (zero-width joiners, variation selectors and all other special cases). — xoudini, Apr 25 '17 at 19:41

score 11 · Answer 1 · edited May 23 '17 at 12:34

11

Based on the insights gained at Why are emoji characters like 👩‍👩‍👧‍👦 treated so strangely in Swift strings?, a sensible approach might be to replace Unicode scalars:

extension String {
    func replacingFirstOccurrence(of target: UnicodeScalar, with replacement: UnicodeScalar) -> String {

        let uc = self.unicodeScalars
        guard let idx = uc.index(of: target) else { return self }
        let prefix = uc[uc.startIndex..<idx]
        let suffix = uc[uc.index(after: idx) ..< uc.endIndex]
        return "\(prefix)\(replacement)\(suffix)"
    }
}

Example:

let family1 = "‍‍‍"
print(family1.characters.map { Array(String($0).unicodeScalars) })
// [["\u{0001F469}", "\u{200D}"], ["\u{0001F469}", "\u{200D}"], ["\u{0001F467}", "\u{200D}"], ["\u{0001F466}"]]

let family2 = family1.replacingFirstOccurrence(of: "", with: "")
print(family2) // ‍‍‍
print(family2.characters.map { Array(String($0).unicodeScalars) })
// [["\u{0001F469}", "\u{200D}"], ["\u{0001F469}", "\u{200D}"], ["\u{0001F466}", "\u{200D}"], ["\u{0001F466}"]]

And here is a possible version which locates and replaces the Unicode scalars of an arbitrary string:

extension String {
    func replacingFirstOccurrence(of target: String, with replacement: String) -> String {
        let uc = self.unicodeScalars
        let tuc = target.unicodeScalars

        // Target empty or too long:
        if tuc.count == 0 || tuc.count > uc.count {
            return self
        }

        // Current search position:
        var pos = uc.startIndex
        // Last possible position of `tuc` within `uc`:
        let end = uc.index(uc.endIndex, offsetBy: tuc.count - 1)

        // Locate first Unicode scalar
        while let from = uc[pos..<end].index(of: tuc.first!) {
            // Compare all Unicode scalars:
            let to = uc.index(from, offsetBy: tuc.count)
            if !zip(uc[from..<to], tuc).contains(where: { $0 != $1 }) {
                let prefix = uc[uc.startIndex..<from]
                let suffix = uc[to ..< uc.endIndex]
                return "\(prefix)\(replacement)\(suffix)"
            }
            // Next search position:
            uc.formIndex(after: &pos)
        }

        // Target not found.
        return self
    }
}

edited May 23 '17 at 12:34

Community

1
1

answered Apr 25 '17 at 20:24

Martin R

529,903
94
1,240
1,382

Martin but why playground print UnicodeScalarView same after you apply replacing ? – Oleg Gordiichuk Apr 25 '17 at 20:28
@OlegGordiichuk: I inadvertently printed `family1` instead of `family2`, thanks for letting me know. – Martin R Apr 25 '17 at 20:30
How about this let b = "‍‍‍".characters.map{String($0) == "" ? "" : $0} – Oleg Gordiichuk Apr 25 '17 at 20:30
@OlegGordiichuk: That would replace *all* occurrences, not just the first (if it worked) – Martin R Apr 25 '17 at 20:31
But i have a trouble it is replacing only one emoji. And i could not understand why. – Oleg Gordiichuk Apr 25 '17 at 20:33
4

@OlegGordiichuk: Have a look at OP's previous question http://stackoverflow.com/questions/43618487/why-is-treated-so-strangely-in-swift-strings, which is exactly about that problem. – Martin R Apr 25 '17 at 20:34
This fails to compile with some strings, like: `""`, `"multichar"` or `""` – Ky - Apr 25 '17 at 21:35
@BenLeggiero: Because that are not Unicode scalars. – You could define the second parameter as `with replacement: String`, but I am not sure if the results always make sense. – Martin R Apr 25 '17 at 21:36
I see... Well, I'll need the final solution to replace a sequence of scalars with another one, so I'll try some stuff on my end. – Ky - Apr 25 '17 at 21:50
@BenLeggiero: I have added another variant, perhaps that is what you are looking for. – Martin R Apr 26 '17 at 05:46
That last variant passes all my tests! :D – Ky - Apr 27 '17 at 13:50

xoudini · Accepted Answer · 2017-04-27T20:54:37.537

7

Using the range(of:options:range:locale:) the solution became quite concise:

extension String {
    func replaceFirstOccurrence(of searchString: String, with replacementString: String) -> String {
        guard let range = self.range(of: searchString, options: .literal) else { return self }
        return self.replacingCharacters(in: range, with: replacementString)
    }
}

This works by first finding the range of searchString within the instance, and if a range is found the range is replaced with replacementString. Otherwise the instance just returns itself. And, since the range(of:) method returns as soon as it finds a match, the returned range is guaranteed to be the first occurrence.

"221".replaceFirstOccurrence(of: "2", with: "3")                // 321
"‍‍‍".replaceFirstOccurrence(of: "\u{1f469}", with: "\u{1f468}") // ‍‍‍

^{*To clarify, the last test case converts woman-woman-girl-boy to man-woman-girl-boy.}

edited Apr 27 '17 at 20:54

answered Apr 27 '17 at 20:46

xoudini

7,001
5
23
37

2

That is indeed elegant and far easier, using the `.literal` option. – Martin R Apr 28 '17 at 03:48
2

`.literal` is documented as "Exact character-by-character equivalence", but apparently "character" does not mean a Swift `Character` in this context. My guess would be that it actually means "exact Unicode scalar equivalence" or "exact UTF-16 equivalence" (since `.literal` is defined in `NSString.CompareOptions` and `NSString` is based on `unichar`). – Martin R Apr 28 '17 at 05:26
@MartinR I think you're correct, but since you can't search with malformed UTF-16 in Swift, it essentially does mean unicode scalar equivalence. – xoudini Apr 28 '17 at 05:44
Indeed, it makes no difference. – Perhaps that information about `.literal` would be a useful addition to your other answer http://stackoverflow.com/a/43619065/1187415 ? – Martin R Apr 28 '17 at 06:46
@MartinR I guess it would be good to point out, yes. I'll edit it in there later today when I get the chance! – xoudini Apr 28 '17 at 06:51
1

This also passes all my tests, and feels much more at-home in my library. Thanks! – Ky - May 01 '17 at 19:04

Is it possible to write a Swift function that replaces only part of an extended grapheme cluster like ‍‍‍?

2 Answers2

Linked