1

Using Python, how would I encode all the characters of a string to a URL-encoded string?

As of now, just about every answer eventually references the same methods, such as urllib.parse.quote() or urllib.parse.urlencode(). While these answers are technically valid (they follow the required specifications for encoding special characters in URLs), I have not managed to find a single answer that describes how to encode other/non-special characters as well (such as lowercase or uppercase letters).

How do I take a string and encode every character into a URL-encoded string?

Xiddoc
  • 3,369
  • 3
  • 11
  • 37
  • 2
    Could I ask what you need it for? I'm sure there's a good reason, but I want to make sure this isn't an [XY problem](https://meta.stackexchange.com/q/66377/343832). – wjandrea May 20 '21 at 23:53
  • 1
    While pentesting some websites, I have found that double-encoding all the characters in a query can bypass Web Application Firewalls. I've been using a website to do the encoding work for me, but I finally sought a programmatic way to encode the data. – Xiddoc May 21 '21 at 00:03

1 Answers1

4

This gist reveals a very nice answer to this problem. The final function code is as follows:

def encode_all(string):
    return "".join("%{0:0>2x}".format(ord(char)) for char in string)

Let's break this down.

The first thing to notice is that the return value is a generator expression (... for char in string) wrapped in a str.join call ("".join(...)). This means we will be performing an operation for each character in the string, then finally joining each outputted string together (with the empty string, "").

The operations performed per character can be broken down into the following:

  • ord(char): Convert each character to the corresponding number.
  • "%{0:0>2x}".format(...): Convert the number to a hexadecimal value. Then, format the hexadecimal value into a string (with a prefixed "%").

When you look at the whole function from an overview, it is converting each character to a number, converting that number to hexadecimal, then jamming all the hexadecimal values into a string (which is then returned).

Xiddoc
  • 3,369
  • 3
  • 11
  • 37
  • 4
    You don't need to format twice. `"%{0:0>2x}".format(ord(char))` – wjandrea May 25 '21 at 02:02
  • Is this designed for non-ASCII characters? For example, zero-padding to two spaces doesn't really make sense for codepoints greater than U+FF. I think you'd be better off encoding first, then URL-quoting. – wjandrea May 25 '21 at 02:05
  • 2
    @wjandrea it is not; but [neither are the applicable standards](https://en.wikipedia.org/wiki/Percent-encoding#Character_data), which basically just expect you to muddle through somehow. – Karl Knechtel May 25 '21 at 02:09
  • 1
    If it helps, here's a decoder: `''.join(chr(int(c, 16)) for c in re.findall(r'%([0-9A-Fa-f]{2,6})', string))` – wjandrea May 25 '21 at 02:16