0

I want to get some number data from a Chinese shopping website. But I can't find a good way to encode the data I got. If I use NSNSUTF8StringEncoding, it prints nil. And if use NSASCIIStringEncoding, the Chinese characters on the website are not presenting correctly.Is there any way to solve that, or should I use a third networking library like AFNetworking?

Here are my codes:

import UIKit

class ViewController: UIViewController {
  override func viewDidLoad()
{
    super.viewDidLoad()
    // Do any additional setup after loading the view, typically from a nib.

    let url = NSURL(string:"https://item.taobao.com/item.htm?id=45457007854")!


    let request = NSURLRequest(URL: url)

    let task = NSURLSession.sharedSession().dataTaskWithRequest(request, completionHandler: { (data, response, error) -> Void in
        if let urlContent=data{

            print(urlContent)

        let webContent = NSString(data: urlContent, encoding:NSASCIIStringEncoding)

        print(webContent)

        }

    })
    task.resume()




}

override func didReceiveMemoryWarning() {
    super.didReceiveMemoryWarning()
    // Dispose of any resources that can be recreated.
}


}

And part of the results looks like this

print(urlContent)

3a226b67 2f63756e 74616f2d 6379636c 652d6465 7461696c 2f302e30 2e322f22 2c227072 65636f6e 64697469 6f6e223a 22675f63 6f6e6669 672e6375 6e74616f 4379636c 65497465 6d222c22 696e6974 223a226e 65772043 6f6d706f 6e656e74 287b2474 61726765 743a2723 4a5f6375 6e74616f 4379636c 65277d29 222c226c 6f616422 3a22222c 22747269 67676572 223a2222 2c227265 74727922 3a317d29 3b0a7d29 28293b0a 0a3c2f73 63726970 743e3c73 63726970 74207372 633d222f 2f672e61 6c696364 6e2e636f 6d2f3f3f 6b697373 792f6b2f 312e342e 31342f73 6565642d 6d696e2e 6a732c74 622f676c 6f62616c 2f332e35 2e33352f 676c6f62 616c2d6d 696e2e6a 732c7462 2f697465 6d2d6465 7461696c 2f372e31 332e332f 706c6174 666f726d 2d6d696e 2e6a7322 20636861 72736574 3d227574 662d3822 3e3c2f73 63726970 743e0a20 2020200a 0a0a2020 20203c2f 626f6479 3e0a3c2f 68746d6c 3e0a>

print(webContent)

The content is almost the same as the content you get from Viewing Page source.

Optional(
<!doctype html>
<html><!-- cph -->
<head>

<meta http-equiv="X-UA-Compatible" content="IE=edge"/>
<meta charset="gbk"/>
<meta name="format-detection" content="telephone=no, address=no">
<link rel="dns-prefetch" href="//g.alicdn.com">
<link rel="dns-prefetch" href="//gtms01.alicdn.com">
<link rel="dns-prefetch" href="//gtms02.alicdn.com">
<link rel="dns-prefetch" href="//gtms03.alicdn.com">
<link rel="dns-prefetch" href="//gtms04.alicdn.com">
<link rel="dns-prefetch" href="//gd1.alicdn.com">
<link rel="dns-prefetch" href="//gd2.alicdn.com">
<link rel="dns-prefetch" href="//gd3.alicdn.com">
<link rel="dns-prefetch" href="//gd4.alicdn.com">
<link href="//item.taobao.com/item.html?id=45457007854" rel="canonical">
<link rel="alternate" hreflang="zh-Hant" href="http://tw.taobao.com/item/45457007854.htm" />
<meta name="renderer" content="webkit"/>
<meta name="referrer" content="always">
<meta name="description" content="»¶Ó­Ç°À´ÌÔ±¦ÍøÊµÁ¦ÍúÆÌ£¬Ñ¡¹ºÊÖ¶¯³È×ÓÕ¥Ö­Æ÷Ò˼ÒÄûÃÊѹ֭Æ÷¼·Ë®¹û¼ÐÔ­Ö­»ú ³ø·¿ÓÃÆ·´´ÒâÉñÆ÷,ÏëÁ˽â¸ü¶àÊÖ¶¯³È×ÓÕ¥Ö­Æ÷Ò˼ÒÄûÃÊѹ֭Æ÷¼·Ë®¹û¼ÐÔ­Ö­»ú ³ø·¿ÓÃÆ·´´ÒâÉñÆ÷£¬Çë½øÈë±Ë°¶Ê³Éеı˰¶Ê³ÉÐʵÁ¦ÍúÆÌ£¬¸ü¶àÉÌÆ·ÈÎÄãÑ¡¹º"/>
<meta name="keywords" content="ÌÔ±¦,Ìͱ¦,ÍøÉϹºÎï,µêÆÌ, ÊÖ¶¯³È×ÓÕ¥Ö­Æ÷Ò˼ÒÄûÃÊѹ֭Æ÷¼·Ë®¹û¼ÐÔ­Ö­»ú ³ø·¿ÓÃÆ·´´ÒâÉñÆ÷."/>
<meta name="data-spm" content="2013"/>
<meta name="microscope-data"                     
  • Either you *know* which encoding the server uses, or you can *determine* it from the http response, compare http://stackoverflow.com/a/32051684/1187415. – Martin R Jun 10 '16 at 14:18
  • When I used `NSUTF8StringEncoding` with that URL, it worked fine (though the HTML I received is a little different than the one you've shared here ... is your response data from a different URL than you shared with us?). – Rob Jun 10 '16 at 14:47
  • Two final observations: If you don't know an encoding, save the raw `NSData` to a local file and then you can use `NSString(contentsOfURL:, usedEncoding:)` to try determine what encoding was used. BTW, while it's rare, it's technically possible for the file to not be encoded properly, and you can use `iconv` to identify the offending character. See second bullet at http://stackoverflow.com/a/18169887/1271826. – Rob Jun 10 '16 at 15:55
  • @Rob I think it's because the Internet providers are different. And this webpage of the URL will slightly change the contents they present according to your IP's country. When I use a VPN to change my IP to CA, United states, it didn't print anything. That's why I present the output of my control sole. What kind of result you get? Would you like to send it to my email: 0efficient1@gmail.com ? – littlelydia Jun 10 '16 at 16:03
  • @Rob I got your email. The price 19.8 is still not in the file you sent to me. And that's what I really want. And I tried your way of writing the data to file in Swift, but didn't succeed yet. Will keep trying! – littlelydia Jun 11 '16 at 15:38
  • @MartinR I tried your method. And got the content of the page for the URL. But the actual data I want to get is a price 19.8, which can't be found in the page source, but only can be found by developer tools to check the elements of the page. Don't know what to do. **PS: the url in the code is invalid now, has changed to https://item.taobao.com/item.htm?id=45457007854** – littlelydia Jun 11 '16 at 15:53
  • @littlelydia - Correct, 19.80 is not in the response. This is not a static html page. It would appear to be the result of AJAX updates. To follow the 100+ queries going on for this page and find the one with the price (and then figure out how to follow that chain yourself programmatically) is non-trivial. It will also be incredibly fragile (any change to their web site will break your scraper). Plus, I'd be surprised if their ToS permits this sort of scraping. I'd reach out to them and see if they have a public API. – Rob Jun 11 '16 at 18:20
  • @Rob yeah That's not a static html page. They did have public APIs. Maybe this one may help. [link](http://open.taobao.com/doc2/apiDetail.htm?spm=a219a.7386797.0.0.Yo5Oas&apiId=10927) For a green hand like me, it may take some time to properly use the API in the APP. Thank you! – littlelydia Jun 12 '16 at 14:03
  • I hear you, but the public API's are definitely a much, much better way to go. It will be much easier and less fragile. – Rob Jun 12 '16 at 14:11

0 Answers0