9

I use a URL entered by the user as text to initialize a QUrl object. Later I want to convert the QUrl back into a string for displaying it and to check it using regular expression. This works fine as long as the user does not enter any percent encoded URLs.

Why doesn't the following example code work?

qDebug() << QUrl("http://test.com/query?q=%2B%2Be%3Axyz%2Fen").toDisplayString(QUrl::FullyDecoded); 

It simply doesn't decode any of the percent-encoded characters. It should print "http://test.com/query?q=++e:xyz/en" but it actually prints "http://test.com/query?q=%2B%2Be%3Axyz%2Fen".

I also tried a lot of other methods like fromUserInput() but I could not make the code work correctly in Qt5.3.

Can someone explain me how to do this and why the above code doesn't work (i.e. showing the decoded URL) even when using QUrl::FullyDecoded?

UPDATE

After getting the fromPercentEncoding() hint, I tried the following code:

QUrl UrlFromUserInput(const QString& input)
{
   QByteArray latin = input.toLatin1();
   QByteArray utf8 = input.toUtf8();
   if (latin != utf8)
   {
      // URL string containing unicode characters (no percent encoding expected)
      return QUrl::fromUserInput(input);
   }
   else
   {
      // URL string containing ASCII characters only (assume possible %-encoding)
      return QUrl::fromUserInput(QUrl::fromPercentEncoding(input.toLatin1()));
   }
}

This allows the user to input unicode URLs and percent-encoded URLs and it is possible to decode both kinds of URLs for displaying/matching. However the percent-encoded URLs did not work in QWebView... the web-server responded differently (it returned a different page). So obviously QUrl::fromPercentEncoding() is not a clean solution since it effectively changes the URL. I could create two QUrl objects in the above function... one constructed directly, one constructed using fromPercentEncoding(), using the first for QWebView and the latter for displaying/matching only... but this seems absurd.

Silicomancer
  • 8,604
  • 10
  • 63
  • 130
  • What do you mean, "why doesn't it work"? What are you expecting it to print? – peppe Jun 24 '14 at 07:40
  • Similar question - http://stackoverflow.com/questions/4815418/qt-how-to-decode-string-in-qt-4-7-old-method-are-gone/4815567 – sashoalm Jun 24 '14 at 07:46
  • 1
    If you can't find a solution here, just post an email on the interest @ qt-project.org mailing list. QUrl maintaners are extremely active there. – peppe Jun 25 '14 at 07:31

3 Answers3

18

#Conclusion

I've done some research, the conclusion so far is: absurd.

QUrl::fromPercentEncoding() is the way to go and what OP has done in the UPDATE section should've been the accepted answer to the question in title.

I think Qt's document of QUrl::toDisplayString is a little bit misleading :

"Returns a human-displayable string representation of the URL. The output can be customized by passing flags with options. The option RemovePassword is always enabled, since passwords should never be shown back to users."

Actually it doesn't claim any decoding ability, the document here is unclear about it's behavior. But at least the password part is true. I've found some clues on Gitorious:

"Add QUrl::toDisplayString(), which is toString() without password. And fix documentation of toString() which said this was the method to use for displaying to humans, while this has never been true."


#Test Code In order to discern the decoding ability of different functions. The following code has been tested on Qt 5.2.1 (not tested on Qt 5.3 yet!)

QString target(/*path*/);

QUrl url_path(target);
qDebug() << "[Original String]:" << target;
qDebug() << "--------------------------------------------------------------------";
qDebug() << "(QUrl::toEncoded)          :" << url_path.toEncoded(QUrl::FullyEncoded);
qDebug() << "(QUrl::url)                :" << url_path.url();
qDebug() << "(QUrl::toString)           :" << url_path.toString(); 
qDebug() << "(QUrl::toDisplayString)    :" << url_path.toDisplayString(QUrl::FullyDecoded);
qDebug() << "(QUrl::fromPercentEncoding):" << url_path.fromPercentEncoding(target.toUtf8());

P.S. QUrl::url is just synonym for QUrl::toString.


#Output [Case 1]: When target path = "%_%" (test the functionality of encoding):

[Original String]: "%_%" 
-------------------------------------------------------------------- 
(QUrl::toEncoded)          : "%25_%25" 
(QUrl::url)                : "%25_%25" 
(QUrl::toString)           : "%25_%25" 
(QUrl::toDisplayString)    : "%25_%25" 
(QUrl::fromPercentEncoding): "%_%" 

[Case 2]: When target path = "Meow !" (test the functionality of encoding):

[Original String]: "Meow !" 
-------------------------------------------------------------------- 
(QUrl::toEncoded)          : "Meow%20!" 
(QUrl::url)                : "Meow !" 
(QUrl::toString)           : "Meow !" 
(QUrl::toDisplayString)    : "Meow%20!" // "Meow !" when using QUrl::PrettyDecoded mode
(QUrl::fromPercentEncoding): "Meow !" 

[Case 3]: When target path = "Meow|!" (test the functionality of encoding):

[Original String]: "Meow|!" 
-------------------------------------------------------------------- 
(QUrl::toEncoded)          : "Meow%7C!" 
(QUrl::url)                : "Meow%7C!" 
(QUrl::toString)           : "Meow%7C!" 
(QUrl::toDisplayString)    : "Meow|!" // "Meow%7C!" when using QUrl::PrettyDecoded mode
(QUrl::fromPercentEncoding): "Meow|!" 

[Case 4]: When target path = "http://test.com/query?q=++e:xyz/en" (none % encoded):

[Original String]: "http://test.com/query?q=++e:xyz/en" 
-------------------------------------------------------------------- 
(QUrl::toEncoded)          : "http://test.com/query?q=++e:xyz/en" 
(QUrl::url)                : "http://test.com/query?q=++e:xyz/en" 
(QUrl::toString)           : "http://test.com/query?q=++e:xyz/en" 
(QUrl::toDisplayString)    : "http://test.com/query?q=++e:xyz/en" 
(QUrl::fromPercentEncoding): "http://test.com/query?q=++e:xyz/en" 

[Case 5]: When target path = "http://test.com/query?q=%2B%2Be%3Axyz%2Fen" (% encoded):

[Original String]: "http://test.com/query?q=%2B%2Be%3Axyz%2Fen" 
-------------------------------------------------------------------- 
(QUrl::toEncoded)          : "http://test.com/query?q=%2B%2Be%3Axyz%2Fen" 
(QUrl::url)                : "http://test.com/query?q=%2B%2Be%3Axyz%2Fen" 
(QUrl::toString)           : "http://test.com/query?q=%2B%2Be%3Axyz%2Fen" 
(QUrl::toDisplayString)    : "http://test.com/query?q=%2B%2Be%3Axyz%2Fen" 
(QUrl::fromPercentEncoding): "http://test.com/query?q=++e:xyz/en" 

P.S. I also encounter the bug that Ilya mentioned in comments: Percent Encoding doesn't seem to be working for '+' in QUrl


#Summary

The result of QUrl::toDisplayString is ambiguous. As the document says, the QUrl::FullyDecoded mode must be used with care. No matter what type of URL you got, encode them by QUrl::toEncode and display them with QUrl::fromPercentEncoding when necessary.

As for the malfunction of percent-encoded URLs in QWebView mentioned in OP, more details are needed to debug it. Different function and different mode used could be the reason.


#Helpful Resources

  1. RFC 3986 (which QUrl conforms)
  2. Encode table
  3. Source of qurl.cpp on Gitorious
Community
  • 1
  • 1
Tay2510
  • 5,748
  • 7
  • 39
  • 58
  • 2
    Thanks for your elaborate work. I agree that applying fromPercentEncoding() *after* constructing the QUrl from the original string is the right idea. Regarding the open question about how web servers handle encoded URLs... I could not find much in the web. With your way of handling this should be no problem. But it is obvious that percent encoding is not totally transparent since the server I used for tests definitely returns different pages. I suppose this totally depends on how the server side scripts are programmed. – Silicomancer Jun 25 '14 at 07:19
  • I wonder if there is any sensible report/suggestion we could create for the Qt issue tracker. – Silicomancer Jun 25 '14 at 07:23
  • 2
    Very good issue actually, though I think it's understandable that `QUrl` is not as omnipotent as other modern browser's decoder. In your case, can't you just use `QString`(from user) to do displaying/matching stuff and load it to `QUrl` only when `QWebView` needs it? – Tay2510 Jun 25 '14 at 08:14
3

You can use QUrlQuery::toString(QUrl::FullyEncoded) or QUrl::fromPercentEncoding() for this converting.

Ilya
  • 4,583
  • 4
  • 26
  • 51
  • QUrlQuery::toString() does not help since I operate on a complete URL, not the query only. QUrl::fromPercentEncoding() actually seems to work. But I need to convert the user input toLatin1 first (to get a QByteArray) which unfortunately kills any unicode user input. Maybe a good trade of... but still no clean solution, is it? – Silicomancer Jun 21 '14 at 16:45
  • Yes, I see. Which version of Qt do you use? It looks that in 5.0.2 there was a problem in decoding: [Percent Encoding doesn't seem to be working for '+' in QUrl](https://bugreports.qt-project.org/browse/QTBUG-31660) – Ilya Jun 21 '14 at 16:50
  • Ok, now I see that my answer is useless. And I don't understand why do you need to process '%' in uncode string. I was sure, that we shouldn't mix it: in 1-byte string we need '%' chars, in 2-byte strings we don't need it. – Ilya Jun 21 '14 at 17:27
  • 1
    Your are right. But please understand that I do not know what the user enters in the QLineEdit. The input could be %-encoded URL but also could be unicode URL. Both in a QString. I think that's a common problem. So I expect Qt to provide a function that accepts both kinds of URLs and decodes it necessary. – Silicomancer Jun 21 '14 at 18:38
  • I think it is good idea to analyze and copy logic of address bar of Google Chrome. Since it's source codes are available, you can see how it works inside (it will not be Qt implementation with QUrl, but you'll see proper logic of convertations). – Ilya Jun 22 '14 at 03:10
3

I am not sure why toDisplayString(QUrl::FullyDecoded) does not work.

After trying several versions I have found that copy.query(QUrl::FullyDecoded) does decode the query part. The Documentation has an example with the the following code does return the decoded URL:

QUrl url("http://test.com/query?q=%2B%2Be%3Axyz%2Fen");
url.setQuery(url.query(QUrl::FullyDecoded), QUrl::DecodedMode);
qDebug() << url.toString();

To solve the problem this way is not optimal because the query part is copied without need.

waynix
  • 501
  • 3
  • 13