1

I am trying to make a web request to a URL that needs to keep accented characters instead of percent encoding them. E.g. é must NOT change to e%CC%81. I cannot change this.

These are the allowed characters that shouldn't be percent encoded: AaÁáBbCcDdEeÉéFfGgHhIiÍíJjKkLlMmNnOoÓóÖöŐőPpQqRrSsTtUuÚúÜüŰűVvWwXxYyZz0123456789-

Here is an example of a url I need

https://helyesiras.mta.hu/helyesiras/default/suggest?q=hány%20éves

You can try this url in your web borwser to confirm its working. (The site is in Hungarian.) If you try the proper percent encoded version of this url (https://helyesiras.mta.hu/helyesiras/default/suggest?q=ha%CC%81ny%20e%CC%81ves) then the website will give an error. (Also in Hungarian.)

I have my custom encoder to get this URL string. However to make a web request I need to convert the String to URL.

I tried 2 ways:

  1. URL(string:)

let urlStr = "https://helyesiras.mta.hu/helyesiras/default/suggest?q=hány%20éves"
var url = URL(string: urlStr)

// ERROR: Returns nil


  1. URLComponents with percentEncodedQueryItems

var urlComponents = URLComponents()
urlComponents.scheme = "https"
urlComponents.host = "helyesiras.mta.hu"
urlComponents.path = "/helyesiras/default/suggest"
urlComponents.percentEncodedQueryItems = [           // ERROR: invalid characters in percent encoded query items
    URLQueryItem(name: "q", value: "hány%20éves")
]
let url = urlComponents.url

Is it possible to create URLs without Foundation APIs checking its validity? Or can I create my own validation rules?

Leo Dabus
  • 229,809
  • 59
  • 489
  • 571
Kocsis Kristof
  • 74
  • 2
  • 10
  • 1
    you are using the wrong URLComponents property it should be `queryItems` not `percentEncodedQueryItems` => `urlComponents.queryItems = [URLQueryItem(name: "q", value: "hány éves")]` the resulting query part should be `"h%C3%A1ny%20%C3%A9ves"` – Leo Dabus Aug 08 '20 at 15:35
  • @LeoDabus I know, this is the problem, `urlComponents` is forcing percentEncoding when I don't want urlPercent encoding. I need the á to stay á and é to stay é. – Kocsis Kristof Aug 08 '20 at 16:47
  • 1
    You need to fix the server end. The percent encoding is correct – Leo Dabus Aug 08 '20 at 16:49
  • Regarding about if it works or not in Safari without percent encoding doesn’t mean anything – Leo Dabus Aug 08 '20 at 16:52

1 Answers1

2

Safari is percent-encoding the URL. You're just percent-encoding it differently (and in a way your server is rejecting). What Safari sends to the server is:

GET /helyesiras/default/suggest?q=h%C3%A1ny%20%C3%A9ves HTTP/1.1

You can check that using Charles. Your website is behaving correctly and does not appear to require unencoded URLs.

It is not valid to send unencoded URLs, and Safari doesn't. There's no way to do it with URLSession either. You'd have to connect to the socket directly and build your own HTTP stack, which is quite possible, but I don't think you want to do that.

As Leo notes, the correct way to do this is using:

URLQueryItem(name: "q", value: "hány éves")

Replacing the %20 with the unencoded " " so that you don't double-encode the percent.

If you encode the string by hand, you'll find the same encoding:

print("hány éves".addingPercentEncoding(withAllowedCharacters: .urlQueryAllowed))
// Optional("h%C3%A1ny%20%C3%A9ves")

(But you should use URLComponents. addingPercentEncoding is extremely error-prone.)

The preferred UTF-8 encoding of á is as the unicode code point LATIN SMALL LETTER A WITH ACUTE (C3 A1). What you're encoding is LATIN SMALL LETTER A (61) followed by COMBINING ACUTE ACCENT (CC 81). I suspect your server is not applying Unicode normalization rules. While that's unfortunate the fix is simple: use URLComponents, and you'll get the same correct behavior as Safari.

Rob Napier
  • 286,113
  • 34
  • 456
  • 610