-4

is there some algorithm in C# to encode url with symbols that can correct display in web-browser?

something like Base64.

  • i search other algrithms, base64 i already use:) โ€“ user2554476 Jul 05 '13 at 17:00
  • 2
    I typed "C# url encode" into a search engine and the first result answers your question. Please show a little more effort in your questions. โ€“ Eric Lippert Jul 05 '13 at 17:01
  • Welcome! There are several methods, each with subtly different behaviour. They've been discussed at length in other questions though: http://stackoverflow.com/a/11236038/1344760 โ€“ RichardTowers Jul 05 '13 at 17:03

1 Answers1

1

The Standard (RFC 3986 aka STD 66) lays it out for you. In particular, ยง2 and 2.1:

2. Characters

The URI syntax provides a method of encoding data, presumably for the sake of identifying a resource, as a sequence of characters. The URI characters are, in turn, frequently encoded as octets for transport or presentation. This specification does not mandate any particular character encoding for mapping between URI characters and the octets used to store or transmit those characters. When a URI appears in a protocol element, the character encoding is defined by that protocol; without such a definition, a URI is assumed to be in the same character encoding as the surrounding text.

The ABNF notation defines its terminal values to be non-negative integers (codepoints) based on the US-ASCII coded character set [ASCII]. Because a URI is a sequence of characters, we must invert that relation in order to understand the URI syntax. Therefore, the integer values used by the ABNF must be mapped back to their corresponding characters via US-ASCII in order to complete the syntax rules.

A URI is composed from a limited set of characters consisting of digits, letters, and a few graphic symbols. A reserved subset of those characters may be used to delimit syntax components within a URI while the remaining characters, including both the unreserved set and those reserved characters not acting as delimiters, define each component's identifying data.

2.1. Percent-Encoding

A percent-encoding mechanism is used to represent a data octet in a component when that octet's corresponding character is outside the allowed set or is being used as a delimiter of, or within, the component. A percent-encoded octet is encoded as a character triplet, consisting of the percent character "%" followed by the two hexadecimal digits representing that octet's numeric value. For example, "%20" is the percent-encoding for the binary octet "00100000" (ABNF: %x20), which in US-ASCII corresponds to the space character (SP). Section 2.4 describes when percent-encoding and decoding is applied.

pct-encoded = "%" HEXDIG HEXDIG

The uppercase hexadecimal digits 'A' through 'F' are equivalent to the lowercase digits 'a' through 'f', respectively. If two URIs differ only in the case of hexadecimal digits used in percent-encoded octets, they are equivalent. For consistency, URI producers and normalizers should use uppercase hexadecimal digits for all percent- encodings.

In general, the only characters that may freely be represented in a URL without being percent-encoded are

  • The unreserved characters. These are the US-ASCII (7-bit) characters
    • A-Z
    • a-z
    • 0-9
    • -._~
  • The reserved characters ... when in use as within their role in the grammar of a URL and its scheme. These reserved characters are:
    • :/?#[]@!$&'()*+,;=

Any other characters, per the standard must be properly percent-encoded.

Further note that a URL may only contains characters drawn from the US-ASCII character set (0x00-0x7F): If your URL contains characters outside that range of codepoints, those characters will need to be suitably encoded for representation in US-ASCII (e.g., via HTML/XML entity references). Further, you application is responsible for interpreting such.

Nicholas Carey
  • 71,308
  • 16
  • 93
  • 135