1

I have a QList of strings that contain portions of URLs that look like this:

/search/?q=god%20of%20war%3A%20collection%20-%20playstation%203&suggestionV=2.

I'm removing the /search/?q=, &suggestionV=2 and %20 portions with the .replace() function as so: currentString.replace("/search/?q=", "").replace("&suggestionV=2", "").replace("%20", " "); which results with: "god of war%3A collection - playstation 3". How do I escape the html character codes found here: https://www.obkb.com/dcljr/charstxt.html in QT? I need to convert the %3A and other codes to the plaintext format so %3A gets converted to :.

ryan714
  • 25
  • 6

1 Answers1

1

HTTP query strings have two primary coding sequences:

  1. A space is often encoded as a + character

  2. Any character that carries a special meaning in an HTML query string is encoded by a % followed by two hexadecimal digits

Different browsers may or may not recognize different kinds of "special" characters that are subjet to %-prefixed hexadecimal encoding. Using simple search/replace to replace %2A with a :, and either %20 or + with a space, and so on, is not reliable and is a flawed approach. Consider the following HTML query string:

%2520

You will discover that %25 is the hexadecimal encoded character %, so after decoding it the string becomes %20. Are you then going to replace this with a space, leaving you with a single space character as the final, decoded string? Fail. The original encoded string was %20.

The only reliable way to decode HTTP query strings is algorithmically: scan the HTTP query string one character at a time, and upon encountering a + or a %xx replace it with the decoded character and then continue with the rest of the string.

Sam Varshavchik
  • 114,536
  • 5
  • 94
  • 148