0

I am creating a NSURL URL will contain some escape character (Japanese)

    NSString* currentlocationbarString = @"mbos.help.jp/search?q=専門&pg=1"
NSString *escapedString = [currentlocationbarString stringByAddingPercentEncodingWithAllowedCharacters:[NSCharacterSet URLHostAllowedCharacterSet]];
NSURL* url = [NSURL URLWithString:escapedString];

//url is mbos.help.jp%2Fsearch%3Fq=%E5%B0%82%E9%96%80&pg=1

When I create NSURLComponents and try to get query items it gives me nil.

NSURLComponents *urlComponents = [NSURLComponents componentsWithURL:url
                                                resolvingAgainstBaseURL:YES];
NSArray *queryItems = urlComponents.queryItems;

//here issue with queryItems

if anybody has solution to get query items please help. Thanks in advance

Muhammad Shauket
  • 2,643
  • 19
  • 40

2 Answers2

2

Issue is not with Unicode Characters, whenever you add encoding use proper character set for my case I was using following setURLHostAllowedCharacterSet it means your NSURLComponents only give encoding for your Host, to get correct queryItems use URLQueryAllowedCharacterSet like this way.

NSString* currentlocationbarString = @"mbos.help.jp/search?q=専門&pg=1"
NSString *escapedString = [currentlocationbarString stringByAddingPercentEncodingWithAllowedCharacters:[NSCharacterSet URLQueryAllowedCharacterSet]];
NSURL* url = [NSURL URLWithString:escapedString];

So now you can get queryItems.

NSURLComponents *urlComponents = [NSURLComponents componentsWithURL:url
                                            resolvingAgainstBaseURL:YES];
NSArray *queryItems = urlComponents.queryItems;
Muhammad Shauket
  • 2,643
  • 19
  • 40
0

At least one of the characters 専門 that you use in your search string is invalid Unicode in the form of unpaired UTF-16 surrogate chars, and thus cannot be encoded by stringByAddingPercentEncodingWithAllowedCharacters:, which therefore returns nil.
You can find an example in this post.
Apparently, you had to check for Japanese characters, if encoding is possible.
I must say, I did not expect that either!

Reinhard Männer
  • 14,022
  • 5
  • 54
  • 116
  • character are valid, may be you can try with another characters. – Muhammad Shauket Jan 15 '20 at 03:41
  • I did not say that a character is invalid Unicode, I said that it does not have a paired UTF-16 character. In this case the encoding is not possible. Please check [this docu](https://unicodebook.readthedocs.io/unicode_encodings.html). It says _U+10FFFF is the highest code point encodable to UTF-16 and the highest code point of the Unicode Character Set 6.0. The {U+DBFF, U+DFFF} surrogate pair is the last available pair._ – Reinhard Männer Jan 15 '20 at 07:50