0

The following code fails:

let url = URL(string: "https://www.cardboardconnection.com/1987-topps-baseball-cards")!

var request = URLRequest(url: url)
request.setValue("text/html; charset=utf-8", forHTTPHeaderField: "Content-Type")
request.setValue("text/html; charset=utf-8", forHTTPHeaderField: "Accept")

let task = URLSession.shared.dataTask(with: request) {(data, response, error) in
    guard let data = data else { return }
    print(String(data: data, encoding: .utf8)!)
}

task.resume()

I can't figure out what specifically about this individual website is causing it to fail on the UTF8 conversion. How do I figure this out? And what is the proper conversion? Just looking to get the raw HTML from page.

Ethan Allen
  • 14,425
  • 24
  • 101
  • 194
  • try `windowsCP1254` encoding – Leo Dabus Feb 25 '21 at 01:33
  • Why that specific encoding? – Ethan Allen Feb 25 '21 at 01:34
  • Do you want to know how to detect the string encoding based on the data returned? – Leo Dabus Feb 25 '21 at 01:38
  • yes please I would like to know – Ethan Allen Feb 25 '21 at 01:41
  • 1
    add the extension from my [post](https://stackoverflow.com/a/59843310/2303865) and then `URLSession.shared.dataTask(with: cardboardconnectionURL) {(data, response, error) in guard let data = data, let encoding = data.stringEncoding else { return } print("encoding:", encoding.rawValue) if let string = String(data: data, encoding: encoding) { print(string) } }.resume()` – Leo Dabus Feb 25 '21 at 01:45

1 Answers1

1

Using the trick from how to detect invalid utf8 unicode/binary in a text file

curl -s https://www.cardboardconnection.com/1987-topps-baseball-cards | grep -axv '.*'

This will show two lines that have invalid UTF-8. The trick here is that . only matches legally decoded characters.

The following works, but it feels like I'm missing the simpler way to do this.

var codeUnits: [UTF32.CodeUnit] = []
let sink = { codeUnits.append($0) }
if transcode(data.makeIterator(), from: UTF8.self, to: UTF32.self,
             stoppingOnError: false, into: sink) {
    let string = String(codeUnits.compactMap { UnicodeScalar($0) }.map(String.init).joined())
    print(string)
}

See also https://stackoverflow.com/a/44611946/97337, where Martin R solves this in a better way (though it's still not straightforward).

Rob Napier
  • 286,113
  • 34
  • 456
  • 610