Get numbers characters from a string

Question

How to get numbers characters from a string? I don't want to convert in Int.

var string = "string_1"
var string2 = "string_20_certified"

My result have to be formatted like this:

newString = "1"
newString2 = "20"

What about "৯" (BENGALI DIGIT NINE) or "" (MATHEMATICAL DOUBLE-STRUCK DIGIT ONE) ? — Martin R, Dec 16 '16 at 16:09

dfrib · Accepted Answer · 2016-12-17T01:49:28.427

Pattern matching a `String`'s unicode scalars against Western Arabic Numerals

You could pattern match the unicodeScalars view of a String to a given UnicodeScalar pattern (covering e.g. Western Arabic numerals).

extension String {
    var westernArabicNumeralsOnly: String {
        let pattern = UnicodeScalar("0")..."9"
        return String(unicodeScalars
            .flatMap { pattern ~= $0 ? Character($0) : nil })
    }
}

Example usage:

let str1 = "string_1"
let str2 = "string_20_certified"
let str3 = "a_1_b_2_3_c34"

let newStr1 = str1.westernArabicNumeralsOnly
let newStr2 = str2.westernArabicNumeralsOnly
let newStr3 = str3.westernArabicNumeralsOnly

print(newStr1) // 1
print(newStr2) // 20
print(newStr3) // 12334

Extending to matching any of several given patterns

The unicode scalar pattern matching approach above is particularly useful extending it to matching any of a several given patterns, e.g. patterns describing different variations of Eastern Arabic numerals:

extension String {   
    var easternArabicNumeralsOnly: String {
        let patterns = [UnicodeScalar("\u{0660}")..."\u{0669}", // Eastern Arabic
                                       "\u{06F0}"..."\u{06F9}"] // Perso-Arabic variant 
        return String(unicodeScalars
            .flatMap { uc in patterns.contains{ $0 ~= uc } ? Character(uc) : nil })
    }
}

This could be used in practice e.g. if writing an Emoji filter, as ranges of unicode scalars that cover emojis can readily be added to the patterns array in the Eastern Arabic example above.

Why use the `UnicodeScalar` patterns approach over `Character` ones?

A Character in Swift contains of an extended grapheme cluster, which is made up of one or more Unicode scalar values. This means that Character instances in Swift does not have a fixed size in the memory, which means random access to a character within a collection of sequentially (/contiguously) stored character will not be available at O(1), but rather, O(n).

Unicode scalars in Swift, on the other hand, are stored in fixed sized UTF-32 code units, which should allow O(1) random access. Now, I'm not entirely sure if this is a fact, or a reason for what follows: but a fact is that if benchmarking the methods above vs equivalent method using the CharacterView (.characters property) for some test String instances, its very apparent that the UnicodeScalar approach is faster than the Character approach; naive testing showed a factor 10-25 difference in execution times, steadily growing for growing String size.

Knowing the limitations of working with Unicode scalars vs Characters in Swift

Now, there are drawbacks using the UnicodeScalar approach, however; namely when working with characters that cannot represented by a single unicode scalar, but where one of its unicode scalars are contained in the pattern to which we want to match.

E.g., consider a string holding the four characters "Café". The last character, "é", is represented by two unicode scalars, "e" and "\u{301}". If we were to implement pattern matching against, say, UnicodeScalar("a")...e, the filtering method as applied above would allow one of the two unicode scalars to pass.

extension String {
    var onlyLowercaseLettersAthroughE: String {
        let patterns = [UnicodeScalar("1")..."e"]
        return String(unicodeScalars
            .flatMap { uc in patterns.contains{ $0 ~= uc } ? Character(uc) : nil })
    }
}

let str = "Cafe\u{301}"
print(str)                               // Café
print(str.onlyLowercaseLettersAthroughE) // Cae
                                         /* possibly we'd want "Ca" or "Caé"
                                            as result here                   */

In the particular use case queried by from the OP in this Q&A, the above is not an issue, but depending on the use case, it will sometimes be more appropriate to work with Character pattern matching over UnicodeScalar.

@LeoDabus Thanks for the feedback! I'll need to take this in a moment :) The main reason I chose `UnicodeScalar` over `Character` in this use case is the better performance of the prior, as unic. scalars have a fixed size in the memory (as compared to the character extended grapheme cluster, which does not). In light of this, I'm uncertain if using `reduce` here might possibly affect performance negatively. Also, when using several patterns (`easternArabic...` example), I believe the `reduce` solution above will not conserve the intermutual ordering between letters from two different patterns. — dfrib, Dec 17 '16 at 00:20
so basiclly, if i have to work on string with unique unicodeScalar (like 1 2 3...9) it ll be faster to use unicodeScalar ? And If i've to work on string where characters can have more than 1 unicodeScalar like (è é ...) Character still better ? — Makaille, Dec 17 '16 at 22:08
@Mayke since you only want to filter out digits (`1` .... `9`), you wont run into an issue even if the string you are filtering on contains characters with containing more than 1 unicode scalar (è é ...) since these will still be excluded in the filtering (wont match your digits pattern). The possible issue arises when your pattern to match against contains unicode scalars which may be part of characters that are made up of more than 1 unicode scalar. E.g. if you'd like to pattern match against pattern `"a"..."m"`, then the unicode scalar approach might be fragile as a _character_ `é` ... — dfrib, Dec 17 '16 at 23:57
... will partly pass the filter pattern matching, letting the `e` uniocode scalar part of `é` pass through the filter. Again, for your case (just digits), it should be safe to use the faster unicode scalar approach, but if you want to pattern match against patterns that contains unicode scalars which are a part of special 2-unicode scalar characters, then using the slower pattern matching against `Character` patterns might be a more sensible approach (refer to the Café example above). — dfrib, Dec 17 '16 at 23:59

vacawama · Answer 2 · 2019-08-13T09:33:20.077

16

Edit: Updated for Swift 4 & 5

Here's a straightforward method that doesn't require Foundation:

let newstring = string.filter { "0"..."9" ~= $0 }

or borrowing from @dfri's idea to make it a String extension:

extension String {
    var numbers: String {
        return filter { "0"..."9" ~= $0 }
    }
}

print("3 little pigs".numbers) // "3"
print("1, 2, and 3".numbers)   // "123"

edited Aug 13 '19 at 09:33

answered Dec 16 '16 at 16:13

vacawama

150,663
30
266
294

1

Note that you may omit `self` in the extension, making it even slighty more neat :) – dfrib Dec 16 '16 at 16:38
1

why not also `return String(characters.filter { "0"..."9" ~= $0 })` – Leo Dabus Dec 16 '16 at 20:30
**Swift 5** extension String { var numbers: String { return String(self.filter { "0"..."9" ~= $0 }) } } – uplearned.com Aug 13 '19 at 05:29
2

Thanks for the heads up @uplearnedu.com. In fact, its even simpler now because `String()` isn't needed and I left off `self.` as I had done before. See update. – vacawama Aug 13 '19 at 09:34
For Swift 5 - let newstring = string.filter { $0.isNumber } – Nilanshu Jaiswal Nov 16 '19 at 18:01
1

@NilanshuJaiswal, that could be effective as well. Note that it will match other Unicode characters that are numbers but not 0 to 9 (for example `"\u{215A}"`. – vacawama Nov 17 '19 at 14:50
Oh, I missed that. Yeah, isNumber is not strictly confined to 0 to 9. Thanks :) – Nilanshu Jaiswal Nov 17 '19 at 15:18

Emil Laine · Answer 3 · 2016-12-16T16:23:05.580

4

import Foundation

let string = "a_1_b_2_3_c34"    
let result = string.components(separatedBy: CharacterSet.decimalDigits.inverted).joined(separator: "")
print(result)

Output:

edited Dec 16 '16 at 16:23

answered Dec 16 '16 at 16:15

Emil Laine

41,598
9
101
157

You can use simply `.joined()` instead of `.joined(separator: "")` – Leo Dabus Dec 16 '16 at 20:00

score 1 · Answer 4 · edited Dec 16 '16 at 16:58

1

Here is a Swift 2 example:

let str = "Hello 1, World 62"
let intString = str.componentsSeparatedByCharactersInSet(
    NSCharacterSet
        .decimalDigitCharacterSet()
        .invertedSet)
    .joinWithSeparator("") // Return a string with all the numbers

edited Dec 16 '16 at 16:58

rmaddy

314,917
42
532
579

answered Dec 16 '16 at 16:04

Jimmy James

825
12
28

1

Good, but the question is tagged "swift3", you may want to add a Swift 3 version. :) – Eric Aya Dec 16 '16 at 16:08
This one isn't working ? I can't test this right now ... – Jimmy James Dec 16 '16 at 16:12
Yes it works, but it's in Swift 2. OP needs a Swift 3 version, the syntax is different. – Eric Aya Dec 16 '16 at 16:13

vadian · Answer 5 · 2016-12-16T16:10:43.253

For example with regular expression

let text = "string_20_certified"

let pattern = "\\d+"
let regex = try! NSRegularExpression(pattern: pattern, options: [])

if let match = regex.firstMatch(in: text, options: [], range: NSRange(location: 0, length: text.characters.count)) {
    let newString = (text as NSString).substring(with: match.range)
    print(newString)
}

If there are multiple occurrences of the pattern use matches(in..

let matches = regex.matches(in: text, options: [], range: NSRange(location: 0, length: text.characters.count))
for match in matches {
    let newString = (text as NSString).substring(with: match.range)
    print(newString)
}

I think you should get the length for your NSRange casting the string to NSString `(text as NSString).length` or getting the utf16 count `text.utf16.count` — Leo Dabus, Jan 29 '17 at 04:24

score 0 · Answer 6 · answered Dec 16 '16 at 16:04

This method iterate through the string characters and appends the numbers to a new string:

class func getNumberFrom(string: String) -> String {
    var number: String = ""
    for var c : Character in string.characters {
        if let n: Int = Int(String(c)) {
            if n >= Int("0")! && n < Int("9")! {
                number.append(c)
            }
        }
    }
    return number
}

Get numbers characters from a string

6 Answers6

Pattern matching a String's unicode scalars against Western Arabic Numerals

Extending to matching any of several given patterns

Why use the UnicodeScalar patterns approach over Character ones?

Pattern matching a `String`'s unicode scalars against Western Arabic Numerals

Why use the `UnicodeScalar` patterns approach over `Character` ones?