140

I am pulling a JSON file from a site and one of the strings received is:

The Weeknd ‘King Of The Fall’ [Video Premiere] | @TheWeeknd | #SoPhi

How can I convert things like &#8216 into the correct characters?

I've made a Xcode Playground to demonstrate it:

import UIKit

var error: NSError?
let blogUrl: NSURL = NSURL.URLWithString("http://sophisticatedignorance.net/api/get_recent_summary/")
let jsonData = NSData(contentsOfURL: blogUrl)

let dataDictionary = NSJSONSerialization.JSONObjectWithData(jsonData, options: nil, error: &error) as NSDictionary

var a = dataDictionary["posts"] as NSArray

println(a[0]["title"])
Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
code_cookies
  • 3,910
  • 4
  • 16
  • 14

23 Answers23

194

This answer was last revised for Swift 5.2 and iOS 13.4 SDK.


There's no straightforward way to do that, but you can use NSAttributedString magic to make this process as painless as possible (be warned that this method will strip all HTML tags as well).

Remember to initialize NSAttributedString from main thread only. It uses WebKit to parse HTML underneath, thus the requirement.

// This is a[0]["title"] in your case
let htmlEncodedString = "The Weeknd <em>&#8216;King Of The Fall&#8217;</em>"

guard let data = htmlEncodedString.data(using: .utf8) else {
    return
}

let options: [NSAttributedString.DocumentReadingOptionKey: Any] = [
    .documentType: NSAttributedString.DocumentType.html,
    .characterEncoding: String.Encoding.utf8.rawValue
]

guard let attributedString = try? NSAttributedString(data: data, options: options, documentAttributes: nil) else {
    return
}

// The Weeknd ‘King Of The Fall’
let decodedString = attributedString.string
extension String {

    init?(htmlEncodedString: String) {

        guard let data = htmlEncodedString.data(using: .utf8) else {
            return nil
        }

        let options: [NSAttributedString.DocumentReadingOptionKey: Any] = [
            .documentType: NSAttributedString.DocumentType.html,
            .characterEncoding: String.Encoding.utf8.rawValue
        ]

        guard let attributedString = try? NSAttributedString(data: data, options: options, documentAttributes: nil) else {
            return nil
        }

        self.init(attributedString.string)

    }

}

let encodedString = "The Weeknd <em>&#8216;King Of The Fall&#8217;</em>"
let decodedString = String(htmlEncodedString: encodedString)
akashivskyy
  • 44,342
  • 16
  • 106
  • 116
  • +1 for the answer, -1 for preferring an extension over a method. It will not be clear to the next developer that `stringByConvertingFromHTML` is an extension, clarity is the single most important attribute a program can have. – zaph Sep 01 '14 at 14:30
  • You're right, `stringByConvertingFromHTML` sounds a lot like a `class func`. I altered the example to use a custom init method instead. – akashivskyy Sep 01 '14 at 15:16
  • That misses the point, why add a method to an Apple API when it is more clear just to use a class function. Sure this is just one--until lots of developers start adding extensions, then the confusion really kicks in. – zaph Sep 01 '14 at 17:18
  • 61
    What? Extensions are *meant* to extend existing types to provide new functionality. – akashivskyy Sep 01 '14 at 21:24
  • 4
    I understand what you're trying to say, but negating extensions isn't the way to go. – akashivskyy Sep 01 '14 at 21:29
  • I have to use UTF16 to get the same string, but it is working the same as gtm_stringByUnescapingFromHTML in Objective-C. The only problem is that it take some much longer to compute the change that i can't use it in my project. Any idea why it take so long ? – Dam Dec 10 '14 at 15:07
  • 21
    This method is extremely heavy and is not recommended in tableviews or gridviews – Guido Lodetti Sep 02 '15 at 09:58
  • i was happy with this for most of this week until i ran the ios8 sim and saw that it was unsuitably slow – ekill Oct 16 '15 at 21:31
  • but loading to slow, if use this in a cell. What i have to do? – Muruganandham K Nov 02 '15 at 10:16
  • **Swift 2** version of the extension: http://stackoverflow.com/a/34245313/3411787 – Mohammad Zaid Pathan Dec 12 '15 at 21:42
  • One way to prevent confusion when extending API's is to take advantage of Xcode's syntax coloring and have 'Project' items a different color from the 'Other' items. – Andrew Johnson Mar 25 '16 at 19:41
  • Please, can you add Swift 3 version – Kirill Oct 18 '16 at 16:15
  • 1
    This is great! Although it blocks the main thread, is there any way to run it in the background thread? – MMV Mar 17 '18 at 19:30
  • It would be better to propagate the error making the initializer throw the NSAttributedString error and also allow the user to pass the data directly to the initialiser. Then you can also add a html string initializer that calls the data initializer. – Leo Dabus Feb 15 '19 at 16:57
  • This works great when running on main thread. If this is called on a background thread, it may crash because NSAttributedString with HTML encoded content will try to fire up webkit to do the parsing, which requires to be executed on main thread. This is why some people are reporting it is blocking the main thread / extremely heavy. – Casey Nov 17 '20 at 20:59
  • I realize this method will remove the break lines from the string. Any way to fix this? – Tony TRAN Jan 10 '22 at 04:18
  • @TonyTRAN In HTML, line breaks are treated as spaces, so this is actually expected behavior. You could pre-process your HTML-encoded string by replacing `\n` with `
    ` and then transforming it into `NSAttributedString`. Running `"hello
    world"` through the above code produces `"hello\nworld"`.
    – akashivskyy Jan 10 '22 at 09:50
102

@akashivskyy's answer is great and demonstrates how to utilize NSAttributedString to decode HTML entities. One possible disadvantage (as he stated) is that all HTML markup is removed as well, so

<strong> 4 &lt; 5 &amp; 3 &gt; 2</strong>

becomes

4 < 5 & 3 > 2

On OS X there is CFXMLCreateStringByUnescapingEntities() which does the job:

let encoded = "<strong> 4 &lt; 5 &amp; 3 &gt; 2 .</strong> Price: 12 &#x20ac;.  &#64; "
let decoded = CFXMLCreateStringByUnescapingEntities(nil, encoded, nil) as String
println(decoded)
// <strong> 4 < 5 & 3 > 2 .</strong> Price: 12 €.  @ 

but this is not available on iOS.

Here is a pure Swift implementation. It decodes character entities references like &lt; using a dictionary, and all numeric character entities like &#64 or &#x20ac. (Note that I did not list all 252 HTML entities explicitly.)

Swift 4:

// Mapping from XML/HTML character entity reference to character
// From http://en.wikipedia.org/wiki/List_of_XML_and_HTML_character_entity_references
private let characterEntities : [ Substring : Character ] = [
    // XML predefined entities:
    "&quot;"    : "\"",
    "&amp;"     : "&",
    "&apos;"    : "'",
    "&lt;"      : "<",
    "&gt;"      : ">",

    // HTML character entity references:
    "&nbsp;"    : "\u{00a0}",
    // ...
    "&diams;"   : "♦",
]

extension String {

    /// Returns a new string made by replacing in the `String`
    /// all HTML character entity references with the corresponding
    /// character.
    var stringByDecodingHTMLEntities : String {

        // ===== Utility functions =====

        // Convert the number in the string to the corresponding
        // Unicode character, e.g.
        //    decodeNumeric("64", 10)   --> "@"
        //    decodeNumeric("20ac", 16) --> "€"
        func decodeNumeric(_ string : Substring, base : Int) -> Character? {
            guard let code = UInt32(string, radix: base),
                let uniScalar = UnicodeScalar(code) else { return nil }
            return Character(uniScalar)
        }

        // Decode the HTML character entity to the corresponding
        // Unicode character, return `nil` for invalid input.
        //     decode("&#64;")    --> "@"
        //     decode("&#x20ac;") --> "€"
        //     decode("&lt;")     --> "<"
        //     decode("&foo;")    --> nil
        func decode(_ entity : Substring) -> Character? {

            if entity.hasPrefix("&#x") || entity.hasPrefix("&#X") {
                return decodeNumeric(entity.dropFirst(3).dropLast(), base: 16)
            } else if entity.hasPrefix("&#") {
                return decodeNumeric(entity.dropFirst(2).dropLast(), base: 10)
            } else {
                return characterEntities[entity]
            }
        }

        // ===== Method starts here =====

        var result = ""
        var position = startIndex

        // Find the next '&' and copy the characters preceding it to `result`:
        while let ampRange = self[position...].range(of: "&") {
            result.append(contentsOf: self[position ..< ampRange.lowerBound])
            position = ampRange.lowerBound

            // Find the next ';' and copy everything from '&' to ';' into `entity`
            guard let semiRange = self[position...].range(of: ";") else {
                // No matching ';'.
                break
            }
            let entity = self[position ..< semiRange.upperBound]
            position = semiRange.upperBound

            if let decoded = decode(entity) {
                // Replace by decoded character:
                result.append(decoded)
            } else {
                // Invalid entity, copy verbatim:
                result.append(contentsOf: entity)
            }
        }
        // Copy remaining characters to `result`:
        result.append(contentsOf: self[position...])
        return result
    }
}

Example:

let encoded = "<strong> 4 &lt; 5 &amp; 3 &gt; 2 .</strong> Price: 12 &#x20ac;.  &#64; "
let decoded = encoded.stringByDecodingHTMLEntities
print(decoded)
// <strong> 4 < 5 & 3 > 2 .</strong> Price: 12 €.  @

Swift 3:

// Mapping from XML/HTML character entity reference to character
// From http://en.wikipedia.org/wiki/List_of_XML_and_HTML_character_entity_references
private let characterEntities : [ String : Character ] = [
    // XML predefined entities:
    "&quot;"    : "\"",
    "&amp;"     : "&",
    "&apos;"    : "'",
    "&lt;"      : "<",
    "&gt;"      : ">",

    // HTML character entity references:
    "&nbsp;"    : "\u{00a0}",
    // ...
    "&diams;"   : "♦",
]

extension String {

    /// Returns a new string made by replacing in the `String`
    /// all HTML character entity references with the corresponding
    /// character.
    var stringByDecodingHTMLEntities : String {

        // ===== Utility functions =====

        // Convert the number in the string to the corresponding
        // Unicode character, e.g.
        //    decodeNumeric("64", 10)   --> "@"
        //    decodeNumeric("20ac", 16) --> "€"
        func decodeNumeric(_ string : String, base : Int) -> Character? {
            guard let code = UInt32(string, radix: base),
                let uniScalar = UnicodeScalar(code) else { return nil }
            return Character(uniScalar)
        }

        // Decode the HTML character entity to the corresponding
        // Unicode character, return `nil` for invalid input.
        //     decode("&#64;")    --> "@"
        //     decode("&#x20ac;") --> "€"
        //     decode("&lt;")     --> "<"
        //     decode("&foo;")    --> nil
        func decode(_ entity : String) -> Character? {

            if entity.hasPrefix("&#x") || entity.hasPrefix("&#X"){
                return decodeNumeric(entity.substring(with: entity.index(entity.startIndex, offsetBy: 3) ..< entity.index(entity.endIndex, offsetBy: -1)), base: 16)
            } else if entity.hasPrefix("&#") {
                return decodeNumeric(entity.substring(with: entity.index(entity.startIndex, offsetBy: 2) ..< entity.index(entity.endIndex, offsetBy: -1)), base: 10)
            } else {
                return characterEntities[entity]
            }
        }

        // ===== Method starts here =====

        var result = ""
        var position = startIndex

        // Find the next '&' and copy the characters preceding it to `result`:
        while let ampRange = self.range(of: "&", range: position ..< endIndex) {
            result.append(self[position ..< ampRange.lowerBound])
            position = ampRange.lowerBound

            // Find the next ';' and copy everything from '&' to ';' into `entity`
            if let semiRange = self.range(of: ";", range: position ..< endIndex) {
                let entity = self[position ..< semiRange.upperBound]
                position = semiRange.upperBound

                if let decoded = decode(entity) {
                    // Replace by decoded character:
                    result.append(decoded)
                } else {
                    // Invalid entity, copy verbatim:
                    result.append(entity)
                }
            } else {
                // No matching ';'.
                break
            }
        }
        // Copy remaining characters to `result`:
        result.append(self[position ..< endIndex])
        return result
    }
}

Swift 2:

// Mapping from XML/HTML character entity reference to character
// From http://en.wikipedia.org/wiki/List_of_XML_and_HTML_character_entity_references
private let characterEntities : [ String : Character ] = [
    // XML predefined entities:
    "&quot;"    : "\"",
    "&amp;"     : "&",
    "&apos;"    : "'",
    "&lt;"      : "<",
    "&gt;"      : ">",

    // HTML character entity references:
    "&nbsp;"    : "\u{00a0}",
    // ...
    "&diams;"   : "♦",
]

extension String {

    /// Returns a new string made by replacing in the `String`
    /// all HTML character entity references with the corresponding
    /// character.
    var stringByDecodingHTMLEntities : String {

        // ===== Utility functions =====

        // Convert the number in the string to the corresponding
        // Unicode character, e.g.
        //    decodeNumeric("64", 10)   --> "@"
        //    decodeNumeric("20ac", 16) --> "€"
        func decodeNumeric(string : String, base : Int32) -> Character? {
            let code = UInt32(strtoul(string, nil, base))
            return Character(UnicodeScalar(code))
        }

        // Decode the HTML character entity to the corresponding
        // Unicode character, return `nil` for invalid input.
        //     decode("&#64;")    --> "@"
        //     decode("&#x20ac;") --> "€"
        //     decode("&lt;")     --> "<"
        //     decode("&foo;")    --> nil
        func decode(entity : String) -> Character? {

            if entity.hasPrefix("&#x") || entity.hasPrefix("&#X"){
                return decodeNumeric(entity.substringFromIndex(entity.startIndex.advancedBy(3)), base: 16)
            } else if entity.hasPrefix("&#") {
                return decodeNumeric(entity.substringFromIndex(entity.startIndex.advancedBy(2)), base: 10)
            } else {
                return characterEntities[entity]
            }
        }

        // ===== Method starts here =====

        var result = ""
        var position = startIndex

        // Find the next '&' and copy the characters preceding it to `result`:
        while let ampRange = self.rangeOfString("&", range: position ..< endIndex) {
            result.appendContentsOf(self[position ..< ampRange.startIndex])
            position = ampRange.startIndex

            // Find the next ';' and copy everything from '&' to ';' into `entity`
            if let semiRange = self.rangeOfString(";", range: position ..< endIndex) {
                let entity = self[position ..< semiRange.endIndex]
                position = semiRange.endIndex

                if let decoded = decode(entity) {
                    // Replace by decoded character:
                    result.append(decoded)
                } else {
                    // Invalid entity, copy verbatim:
                    result.appendContentsOf(entity)
                }
            } else {
                // No matching ';'.
                break
            }
        }
        // Copy remaining characters to `result`:
        result.appendContentsOf(self[position ..< endIndex])
        return result
    }
}
Martin R
  • 529,903
  • 94
  • 1,240
  • 1,382
  • 14
    This is brilliant, thanks Martin! Here's the extension with the full list of HTML entities: https://gist.github.com/mwaterfall/25b4a6a06dc3309d9555 I've also slightly adapted it to provide the distance offsets made by the replacements. This allows the correct adjustment of any string attributes or entities that might be affected by these replacements (Twitter entity indices for example). – Michael Waterfall Aug 27 '15 at 17:11
  • 3
    @MichaelWaterfall and Martin this is magnific! works like a charm! I update the extension for Swift 2 http://pastebin.com/juHRJ6au Thanks! – Santiago Sep 17 '15 at 22:29
  • This answer should be preferred and accepted over the accepted one. The accepted answer is impossible to be used for longer texts. – Matti Sep 21 '15 at 14:12
  • 1
    I converted this answer to be compatible with Swift 2 and dumped it in a CocoaPod called [StringExtensionHTML](https://cocoapods.org/pods/StringExtensionHTML) for ease of use. Note that Santiago's Swift 2 version fixes the compile time errors, but taking out the `strtooul(string, nil, base)` entirely will cause the code not to work with numeric character entities and crash when it comes to an entity it doesn't recognize (instead of failing gracefully). – Adela Chang Apr 15 '16 at 16:33
  • 1
    @AdelaChang: Actually I had converted my answer to Swift 2 already in September 2015. It still compiles without warnings with Swift 2.2/Xcode 7.3. Or are you referring to Michael's version? – Martin R Apr 15 '16 at 18:02
  • @MartinR I was actually referring to Santiago's version up above in pastebin. The first time I saw this answer was long ago, so I must have missed the fact that you updated it, but the errors I was referring to was in the pastebin version and not yours. :) – Adela Chang Apr 28 '16 at 20:33
  • @yishus: Thanks for fixing the error in the Swift 3 code! (Previously, I had used `strtoul()` which silently ignores trailing non-digits.) – Martin R Sep 06 '16 at 08:37
  • Thank you for the OSX version. So much easier. – yesthisisjoe Sep 18 '16 at 00:45
  • This is a great answer. I did get some errors compiling it with Swift 4.1 in Xcode 9.2. They were easily fixed by the compiler's suggestions, but it might be worth updating one more time. – user1118321 Feb 18 '18 at 00:19
  • @user1118321: Code updated, thanks for letting me know. – Martin R Feb 18 '18 at 18:09
  • 1
    Thanks, with this answer I solved my issues: I had serious performance problems using NSAttributedString. – mugx May 14 '18 at 03:17
  • 2
    https://gist.github.com/x0rb0t/a6c190dbefdfedad71143ff7f8153588 Complete List from https://dev.w3.org/html5/html-author/charref – Andrei Z. Nov 03 '20 at 20:13
  • After months getting stuck on a issue, this finally helped, thanks much – Sanjeevcn Jun 02 '21 at 13:35
  • ‘ and ’, from the original poster's example, aren't in this dictionary... – Erika Electra Aug 03 '21 at 10:19
  • @Erika: Numerical character entities like `‘` are all decoded correctly. The dictionary is only needed for named character entities like `"`. – Martin R Aug 03 '21 at 10:24
35

Swift 4


  • String extension computed variable
  • Without extra guard, do, catch, etc...
  • Returns the original strings if decoding fails

extension String {
    var htmlDecoded: String {
        let decoded = try? NSAttributedString(data: Data(utf8), options: [
            .documentType: NSAttributedString.DocumentType.html,
            .characterEncoding: String.Encoding.utf8.rawValue
        ], documentAttributes: nil).string

        return decoded ?? self
    }
}
Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
AamirR
  • 11,672
  • 4
  • 59
  • 73
  • 1
    Wow ! works right out of the box for Swift 4 !. Usage // let encoded = "The Weeknd ‘King Of The Fall’" let finalString = encoded.htmlDecoded – Naishta Aug 11 '18 at 15:03
  • 4
    I love the simplicity of this answer. However, it will cause crashes when run in the background because it tries to run on the main thread. – Jeremy Hicks Jan 30 '19 at 04:14
  • @JeremyHicks do you know how to fix this crash? It works perfect in Playground, but it crashes when I'm adding htmlDecoded to decoded json field. I added DispatchQuee.main.async but it not working – Abrcd18 Apr 07 '22 at 07:10
29

Swift 3 version of @akashivskyy's extension,

extension String {
    init(htmlEncodedString: String) {
        self.init()
        guard let encodedData = htmlEncodedString.data(using: .utf8) else {
            self = htmlEncodedString
            return
        }

        let attributedOptions: [String : Any] = [
            NSDocumentTypeDocumentAttribute: NSHTMLTextDocumentType,
            NSCharacterEncodingDocumentAttribute: String.Encoding.utf8.rawValue
        ]

        do {
            let attributedString = try NSAttributedString(data: encodedData, options: attributedOptions, documentAttributes: nil)
            self = attributedString.string
        } catch {
            print("Error: \(error)")
            self = htmlEncodedString
        }
    }
}
Community
  • 1
  • 1
yishus
  • 690
  • 7
  • 12
14

Swift 2 version of @akashivskyy's extension,

 extension String {
     init(htmlEncodedString: String) {
         if let encodedData = htmlEncodedString.dataUsingEncoding(NSUTF8StringEncoding){
             let attributedOptions : [String: AnyObject] = [
            NSDocumentTypeDocumentAttribute: NSHTMLTextDocumentType,
            NSCharacterEncodingDocumentAttribute: NSUTF8StringEncoding
        ]

             do{
                 if let attributedString:NSAttributedString = try NSAttributedString(data: encodedData, options: attributedOptions, documentAttributes: nil){
                     self.init(attributedString.string)
                 }else{
                     print("error")
                     self.init(htmlEncodedString)     //Returning actual string if there is an error
                 }
             }catch{
                 print("error: \(error)")
                 self.init(htmlEncodedString)     //Returning actual string if there is an error
             }

         }else{
             self.init(htmlEncodedString)     //Returning actual string if there is an error
         }
     }
 }
Community
  • 1
  • 1
Mohammad Zaid Pathan
  • 16,304
  • 7
  • 99
  • 130
  • 1
    This code is incomplete and should be avoided by all means. The error is not being handled properly. When there is in fact an error code would crash. You should update your code to at least return nil when there is an error. Or you could just init with original string. In the end you should handle the error. Which is not the case. Wow! – oyalhi Apr 25 '16 at 10:08
11

I was looking for a pure Swift 3.0 utility to escape to/unescape from HTML character references (i.e. for server-side Swift apps on both macOS and Linux) but didn't find any comprehensive solutions, so I wrote my own implementation: https://github.com/IBM-Swift/swift-html-entities

The package, HTMLEntities, works with HTML4 named character references as well as hex/dec numeric character references, and it will recognize special numeric character references per the W3 HTML5 spec (i.e. &#x80; should be unescaped as the Euro sign (unicode U+20AC) and NOT as the unicode character for U+0080, and certain ranges of numeric character references should be replaced with the replacement character U+FFFD when unescaping).

Usage example:

import HTMLEntities

// encode example
let html = "<script>alert(\"abc\")</script>"

print(html.htmlEscape())
// Prints ”&lt;script&gt;alert(&quot;abc&quot;)&lt;/script&gt;"

// decode example
let htmlencoded = "&lt;script&gt;alert(&quot;abc&quot;)&lt;/script&gt;"

print(htmlencoded.htmlUnescape())
// Prints ”<script>alert(\"abc\")</script>"

And for OP's example:

print("The Weeknd &#8216;King Of The Fall&#8217; [Video Premiere] | @TheWeeknd | #SoPhi ".htmlUnescape())
// prints "The Weeknd ‘King Of The Fall’ [Video Premiere] | @TheWeeknd | #SoPhi "

Edit: HTMLEntities now supports HTML5 named character references as of version 2.0.0. Spec-compliant parsing is also implemented.

Youming Lin
  • 334
  • 2
  • 4
  • 3
    This is the most generic answer that works all the time, and not requiring being run on the main thread. This will work even with the most complex HTML escaped unicode strings (such as `( ͡° ͜ʖ ͡° )`), whereas none of the other answers manage that. – Stéphane Copin Nov 06 '17 at 14:13
  • 1
    Yeah, this should be way more up! :) – smat88dd Nov 13 '20 at 07:04
  • 1
    The fact that the original answer is not thread-safe is a very big issue for something so intrinsically low level as a string manipulation – James Robinson Dec 16 '20 at 15:36
9

Swift 4 Version

extension String {

    init(htmlEncodedString: String) {
        self.init()
        guard let encodedData = htmlEncodedString.data(using: .utf8) else {
            self = htmlEncodedString
            return
        }

        let attributedOptions: [NSAttributedString.DocumentReadingOptionKey : Any] = [
            .documentType: NSAttributedString.DocumentType.html,
            .characterEncoding: String.Encoding.utf8.rawValue
        ]

        do {
            let attributedString = try NSAttributedString(data: encodedData, options: attributedOptions, documentAttributes: nil)
            self = attributedString.string
        } 
        catch {
            print("Error: \(error)")
            self = htmlEncodedString
        }
    }
}
Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
pipizanzibar
  • 319
  • 3
  • 12
  • I get "Error Domain=NSCocoaErrorDomain Code=259 "The file couldn’t be opened because it isn’t in the correct format."" when I try to use this. This goes away if I run the full do catch on the main thread. I found this from checking the NSAttributedString documentation: "The HTML importer should not be called from a background thread (that is, the options dictionary includes documentType with a value of html). It will try to synchronize with the main thread, fail, and time out." – MickeDG Oct 17 '17 at 16:47
  • 9
    Please, the `rawValue` syntax `NSAttributedString.DocumentReadingOptionKey(rawValue: NSAttributedString.DocumentAttributeKey.documentType.rawValue)` and `NSAttributedString.DocumentReadingOptionKey(rawValue: NSAttributedString.DocumentAttributeKey.characterEncoding.rawValue)` is horrible. Replace it with `.documentType` and `.characterEncoding` – vadian Dec 04 '17 at 12:09
  • @MickeDG - Can you please explain what exactly you did to resolve this error? I am getting it sporatically. – Ross Barbish Mar 06 '20 at 18:07
  • @RossBarbish - Sorry Ross, this was too long ago, can't remember the details. Have you tried what I suggest in the comment above, i.e. to run the full do catch on the main thread? – MickeDG Mar 10 '20 at 20:14
8
extension String{
    func decodeEnt() -> String{
        let encodedData = self.dataUsingEncoding(NSUTF8StringEncoding)!
        let attributedOptions : [String: AnyObject] = [
            NSDocumentTypeDocumentAttribute: NSHTMLTextDocumentType,
            NSCharacterEncodingDocumentAttribute: NSUTF8StringEncoding
        ]
        let attributedString = NSAttributedString(data: encodedData, options: attributedOptions, documentAttributes: nil, error: nil)!

        return attributedString.string
    }
}

let encodedString = "The Weeknd &#8216;King Of The Fall&#8217;"

let foo = encodedString.decodeEnt() /* The Weeknd ‘King Of The Fall’ */
wLc
  • 968
  • 12
  • 15
7

Swift 4:

The total solution that finally worked for me with HTML code and newline characters and single quotes

extension String {
    var htmlDecoded: String {
        let decoded = try? NSAttributedString(data: Data(utf8), options: [
            .documentType: NSAttributedString.DocumentType.html,
            .characterEncoding: String.Encoding.utf8.rawValue
            ], documentAttributes: nil).string

        return decoded ?? self
    }
}

Usage:

let yourStringEncoded = yourStringWithHtmlcode.htmlDecoded

I then had to apply some more filters to get rid of single quotes (for example, don't, hasn't, It's, etc.), and new line characters like \n:

var yourNewString = String(yourStringEncoded.filter { !"\n\t\r".contains($0) })
yourNewString = yourNewString.replacingOccurrences(of: "\'", with: "", options: NSString.CompareOptions.literal, range: nil)
Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
Naishta
  • 11,885
  • 4
  • 72
  • 54
  • This is essentially a copy of [this other answer](https://stackoverflow.com/a/47480859/1226963). All you did is add some usage which is obvious enough. – rmaddy Nov 01 '18 at 22:31
  • some one has upvoted this answer and found it really useful, what does that tell you ? – Naishta Nov 02 '18 at 11:19
  • @Naishta It tells you that everyone has different opinions and that's OK – Joshua Wolff Feb 16 '20 at 19:23
5

This would be my approach. You could add the entities dictionary from https://gist.github.com/mwaterfall/25b4a6a06dc3309d9555 Michael Waterfall mentions.

extension String {
    func htmlDecoded()->String {

        guard (self != "") else { return self }

        var newStr = self

        let entities = [
            "&quot;"    : "\"",
            "&amp;"     : "&",
            "&apos;"    : "'",
            "&lt;"      : "<",
            "&gt;"      : ">",
        ]

        for (name,value) in entities {
            newStr = newStr.stringByReplacingOccurrencesOfString(name, withString: value)
        }
        return newStr
    }
}

Examples used:

let encoded = "this is so &quot;good&quot;"
let decoded = encoded.htmlDecoded() // "this is so "good""

OR

let encoded = "this is so &quot;good&quot;".htmlDecoded() // "this is so "good""
Bseaborn
  • 4,347
  • 2
  • 14
  • 9
  • 1
    I don't quite like this but I did not find anything better yet so this is an updated version of Michael Waterfall solution for Swift 2.0 https://gist.github.com/jrmgx/3f9f1d330b295cf6b1c6 – jrmgx Nov 02 '15 at 13:27
4

Elegant Swift 4 Solution

If you want a string,

myString = String(htmlString: encodedString)

add this extension to your project:

extension String {

    init(htmlString: String) {
        self.init()
        guard let encodedData = htmlString.data(using: .utf8) else {
            self = htmlString
            return
        }

        let attributedOptions: [NSAttributedString.DocumentReadingOptionKey : Any] = [
           .documentType: NSAttributedString.DocumentType.html,
           .characterEncoding: String.Encoding.utf8.rawValue
        ]

        do {
            let attributedString = try NSAttributedString(data: encodedData,
                                                          options: attributedOptions,
                                                          documentAttributes: nil)
            self = attributedString.string
        } catch {
            print("Error: \(error.localizedDescription)")
            self = htmlString
        }
    }
}

If you want an NSAttributedString with bold, italic, links, etc.,

textField.attributedText = try? NSAttributedString(htmlString: encodedString)

add this extension to your project:

extension NSAttributedString {

    convenience init(htmlString html: String) throws {
        try self.init(data: Data(html.utf8), options: [
            .documentType: NSAttributedString.DocumentType.html,
            .characterEncoding: String.Encoding.utf8.rawValue
            ], documentAttributes: nil)
    }

}
Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
Sébastien REMY
  • 2,399
  • 21
  • 39
3

Swift 5.1 Version

import UIKit

extension String {

    init(htmlEncodedString: String) {
        self.init()
        guard let encodedData = htmlEncodedString.data(using: .utf8) else {
            self = htmlEncodedString
            return
        }

        let attributedOptions: [NSAttributedString.DocumentReadingOptionKey : Any] = [
            .documentType: NSAttributedString.DocumentType.html,
            .characterEncoding: String.Encoding.utf8.rawValue
        ]

        do {
            let attributedString = try NSAttributedString(data: encodedData, options: attributedOptions, documentAttributes: nil)
            self = attributedString.string
        } 
        catch {
            print("Error: \(error)")
            self = htmlEncodedString
        }
    }
}

Also, if you want to extract date, images, metadata, title and description, you can use my pod named:

][1].

Readability kit

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
Jawad Ali
  • 13,556
  • 3
  • 32
  • 49
2

Computed var version of @yishus' answer

public extension String {
    /// Decodes string with HTML encoding.
    var htmlDecoded: String {
        guard let encodedData = self.data(using: .utf8) else { return self }

        let attributedOptions: [String : Any] = [
            NSDocumentTypeDocumentAttribute: NSHTMLTextDocumentType,
            NSCharacterEncodingDocumentAttribute: String.Encoding.utf8.rawValue]

        do {
            let attributedString = try NSAttributedString(data: encodedData,
                                                          options: attributedOptions,
                                                          documentAttributes: nil)
            return attributedString.string
        } catch {
            print("Error: \(error)")
            return self
        }
    }
}
Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
Geva
  • 820
  • 6
  • 14
2

Have a look at HTMLString - a library written in Swift that allows your program to add and remove HTML entities in Strings

For completeness, I copied the main features from the site:

  • Adds entities for ASCII and UTF-8/UTF-16 encodings
  • Removes more than 2100 named entities (like &)
  • Supports removing decimal and hexadecimal entities
  • Designed to support Swift Extended Grapheme Clusters (→ 100% emoji-proof)
  • Fully unit tested
  • Fast
  • Documented
  • Compatible with Objective-C
Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
Despotovic
  • 1,807
  • 2
  • 20
  • 24
2

Swift 4

I really like the solution using documentAttributes. However, it is may too slow for parsing files and/or usage in table view cells. I can't believe that Apple does not provide a decent solution for this.

As a workaround, I found this String Extension on GitHub which works perfectly and is fast for decoding.

So for situations in which the given answer is to slow, see the solution suggest in this link: https://gist.github.com/mwaterfall/25b4a6a06dc3309d9555

Note: it does not parse HTML tags.

Community
  • 1
  • 1
Vincent
  • 4,342
  • 1
  • 38
  • 37
1

Updated answer working on Swift 3

extension String {
    init?(htmlEncodedString: String) {
        let encodedData = htmlEncodedString.data(using: String.Encoding.utf8)!
        let attributedOptions = [ NSDocumentTypeDocumentAttribute: NSHTMLTextDocumentType]

        guard let attributedString = try? NSAttributedString(data: encodedData, options: attributedOptions, documentAttributes: nil) else {
            return nil
        }
        self.init(attributedString.string)
   }
Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
1

Swift 4

extension String {
    var replacingHTMLEntities: String? {
        do {
            return try NSAttributedString(data: Data(utf8), options: [
                .documentType: NSAttributedString.DocumentType.html,
                .characterEncoding: String.Encoding.utf8.rawValue
            ], documentAttributes: nil).string
        } catch {
            return nil
        }
    }
}

Simple Usage

let clean = "Weeknd &#8216;King Of The Fall&#8217".replacingHTMLEntities ?? "default value"
Nischal Hada
  • 3,230
  • 3
  • 27
  • 57
quemeful
  • 9,542
  • 4
  • 60
  • 69
  • I can already hear people complaining about my force unwrapped optional. If you are researching HTML string encoding and you do not know how to deal with Swift optionals, you're too far ahead of yourself. – quemeful Nov 04 '17 at 16:03
  • yup, there is was ([edited Nov 1 at 22:37](https://stackoverflow.com/posts/47112624/revisions) and made the "Simple Usage" much harder to comprehend) – quemeful Nov 05 '18 at 12:31
1

Swift 4

func decodeHTML(string: String) -> String? {

    var decodedString: String?

    if let encodedData = string.data(using: .utf8) {
        let attributedOptions: [NSAttributedString.DocumentReadingOptionKey : Any] = [
            .documentType: NSAttributedString.DocumentType.html,
            .characterEncoding: String.Encoding.utf8.rawValue
        ]

        do {
            decodedString = try NSAttributedString(data: encodedData, options: attributedOptions, documentAttributes: nil).string
        } catch {
            print("\(error.localizedDescription)")
        }
    }

    return decodedString
}
Haroldo Gondim
  • 7,725
  • 9
  • 43
  • 62
1

Swift 4.1 +

var htmlDecoded: String {


    let attributedOptions: [NSAttributedString.DocumentReadingOptionKey : Any] = [

        NSAttributedString.DocumentReadingOptionKey.documentType : NSAttributedString.DocumentType.html,
        NSAttributedString.DocumentReadingOptionKey.characterEncoding : String.Encoding.utf8.rawValue
    ]


    let decoded = try? NSAttributedString(data: Data(utf8), options: attributedOptions
        , documentAttributes: nil).string

    return decoded ?? self
} 
Deepak Singh
  • 241
  • 3
  • 11
  • An explanation would be in order. For example, how is it different from previous answers? What Swift 4.1 features are used? Does it only work in Swift 4.1 and not in previous versions? Or would it work prior to Swift 4.1, say in Swift 4.0? – Peter Mortensen Jan 25 '20 at 21:43
0

Swift 3.0 version with actual font size conversion

Normally, if you directly convert HTML content to an attributed string, the font size is increased. You can try to convert an HTML string to an attributed string and back again to see the difference.

Instead, here is the actual size conversion that makes sure the font size does not change, by applying the 0.75 ratio on all fonts:

extension String {
    func htmlAttributedString() -> NSAttributedString? {
        guard let data = self.data(using: String.Encoding.utf16, allowLossyConversion: false) else { return nil }
        guard let attriStr = try? NSMutableAttributedString(
            data: data,
            options: [NSDocumentTypeDocumentAttribute: NSHTMLTextDocumentType],
            documentAttributes: nil) else { return nil }
        attriStr.beginEditing()
        attriStr.enumerateAttribute(NSFontAttributeName, in: NSMakeRange(0, attriStr.length), options: .init(rawValue: 0)) {
            (value, range, stop) in
            if let font = value as? UIFont {
                let resizedFont = font.withSize(font.pointSize * 0.75)
                attriStr.addAttribute(NSFontAttributeName,
                                         value: resizedFont,
                                         range: range)
            }
        }
        attriStr.endEditing()
        return attriStr
    }
}
Community
  • 1
  • 1
Fangming
  • 24,551
  • 6
  • 100
  • 90
0

Swift 4

extension String {

    mutating func toHtmlEncodedString() {
        guard let encodedData = self.data(using: .utf8) else {
            return
        }

        let attributedOptions: [NSAttributedString.DocumentReadingOptionKey : Any] = [
            NSAttributedString.DocumentReadingOptionKey(rawValue: NSAttributedString.DocumentAttributeKey.documentType.rawValue): NSAttributedString.DocumentType.html,
            NSAttributedString.DocumentReadingOptionKey(rawValue: NSAttributedString.DocumentAttributeKey.characterEncoding.rawValue): String.Encoding.utf8.rawValue
        ]

        do {
            let attributedString = try NSAttributedString(data: encodedData, options: attributedOptions, documentAttributes: nil)
            self = attributedString.string
        }
        catch {
            print("Error: \(error)")
        }
    }
Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
Omar Freewan
  • 2,678
  • 4
  • 25
  • 49
  • Please, the `rawValue` syntax `NSAttributedString.DocumentReadingOptionKey(rawValue: NSAttributedString.DocumentAttributeKey.documentType.rawValue)` and `NSAttributedString.DocumentReadingOptionKey(rawValue: NSAttributedString.DocumentAttributeKey.characterEncoding.rawValue)` is horrible. Replace it with `.documentType` and `.characterEncoding` – vadian Dec 04 '17 at 12:11
  • Performance of this solution is horrible. It is maybe okay for separate caes, parsing files is not advised. – Vincent Feb 05 '18 at 10:19
0

Objective-C

+(NSString *) decodeHTMLEnocdedString:(NSString *)htmlEncodedString {
    if (!htmlEncodedString) {
        return nil;
    }

    NSData *data = [htmlEncodedString dataUsingEncoding:NSUTF8StringEncoding];
    NSDictionary *attributes = @{NSDocumentTypeDocumentAttribute:     NSHTMLTextDocumentType,
                             NSCharacterEncodingDocumentAttribute:     @(NSUTF8StringEncoding)};
    NSAttributedString *attributedString = [[NSAttributedString alloc]     initWithData:data options:attributes documentAttributes:nil error:nil];
    return [attributedString string];
}
Oded Regev
  • 4,065
  • 2
  • 38
  • 50
-1

Use:

NSData dataRes = (nsdata value )

var resString = NSString(data: dataRes, encoding: NSUTF8StringEncoding)
Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
Yogesh shelke
  • 438
  • 4
  • 12
  • An explanation would be in order (by [editing your answer](https://stackoverflow.com/posts/35764914/edit), not here in comments). – Peter Mortensen Jan 25 '20 at 21:16