0

I am making an app for checking grades and assignments for my school. From the web when looking at an assignment you see this: Image from school web page

But the server actually returns a string containing both regular characters while the Chinese characters stay in the regular UTF-8 Encoded form: Raw string shown in the app

How would I parse through the raw String in Swift and decode any UTF-8 encoded characters. . i am having a hard time trying to find and even figure out a solution for this online. Just an FYI i cannot change anything on the backend side.

Stefan DeClerck
  • 1,162
  • 2
  • 12
  • 22
  • Show the raw data you receive from the server (or, well, since raw data would just be a byte sequence, you probably do want to interpret it via the encoding, probably UTF-8). Is it HTML? XML? JSON? Show it as text, not a screenshot. Also, what you're seeing for the Chinese characters is not UTF-8 encoding. It looks like HTML/XML numeric character references. So, the most likely solution is to interpret the raw data not just as a string, but as HTML or XML, which will automatically handle those. – Ken Thomases Oct 01 '16 at 01:58
  • How does one interpret the string as HTML? @KenThomases – Stefan DeClerck Oct 01 '16 at 02:01
  • Well, on macOS, `NSAttributedString` can do it. That functionality is not available on iOS so you probably want to use a third-party library. Search for "ios parse html". – Ken Thomases Oct 01 '16 at 02:07
  • Compare [How do I decode HTML entities in swift?](http://stackoverflow.com/questions/25607247/how-do-i-decode-html-entities-in-swift). – Martin R Oct 01 '16 at 04:56

3 Answers3

6

You can use NSAttributedString to convert these HTML entities to string.

let htmlString = "test北京的test"
if let htmldata = htmlString.dataUsingEncoding(NSUTF8StringEncoding), let attributedString = try? NSAttributedString(data: htmldata, options: [NSDocumentTypeDocumentAttribute: NSHTMLTextDocumentType], documentAttributes: nil) {
    let finalString = attributedString.string
    print(finalString)
    //output: test北京的test
}
Swifty
  • 3,730
  • 1
  • 18
  • 23
3

If you just need to convert only numeric entities, you can use CFStringTransform(_:_:_:_:).

Declaration

func CFStringTransform(_ string: CFMutableString!, 
                     _ range: UnsafeMutablePointer<CFRange>!, 
                     _ transform: CFString!, 
                     _ reverse: Bool) -> Bool

...

transform

A CFString object that identifies the transformation to apply. For a list of valid values, see Transform Identifiers for CFStringTransform. In macOS 10.4 and later, you can also use any valid ICU transform ID defined in the ICU User Guide for Transforms.

(Code tested in Swift 3/Xcode 8, iOS 8.4 simulator.)

func decodeNumericEntities(_ input: String) -> String {
    let nsMutableString = NSMutableString(string: input)
    CFStringTransform(nsMutableString, nil, "Any-Hex/XML10" as CFString, true)
    return nsMutableString as String
}

print(decodeNumericEntities("from &#21271;&#20140;")) //->from 北京

Or if you prefer computed property and extension:

extension String {
    var decodingNumericEntities: String {
        let nsMutableString = NSMutableString(string: self)
        CFStringTransform(nsMutableString, nil, "Any-Hex/XML10" as CFString, true)
        return nsMutableString as String
    }
}

print("from &#21271;&#20140;".decodingNumericEntities) //->from 北京

Remember these codes above do not work for named character entities, such as &gt; or &amp;.

(From this thread in スタック・オーバーフロー(Japanese StackOverflow).)

Community
  • 1
  • 1
OOPer
  • 47,149
  • 6
  • 107
  • 142
1

You have a handful of HTML/XML entities. You can convert them into "normal text" like this:

// Class declaration in ViewController.h
@interface ViewController : UIViewController <NSXMLParserDelegate>
// Implementation of methods in ViewController.m
- (void)viewDidLoad {
    [super viewDidLoad];

    NSString *xml = @"<root>&#21271;</root>";
    NSData *data = [NSData dataWithBytes:[xml UTF8String] length:[xml length]];
    NSXMLParser *parser = [[NSXMLParser alloc] initWithData:data];
    parser.delegate = self;

    [parser parse];
}

- (void)parser:(NSXMLParser *)parser foundCharacters:(NSString *)string {
    NSLog(@"string: %@", string);
}

The log output is:

string: 北
nandsito
  • 3,782
  • 2
  • 19
  • 26