6

For example I quite often see this URL come up.

https://ghbtns.com/github-btn.html?user=example&repo=card&type=watch&count=true

Is the & meant to be & or should/can it be left as &?

unor
  • 92,415
  • 26
  • 211
  • 360
Alexis Tyler
  • 1,394
  • 6
  • 30
  • 48

2 Answers2

6

& is for encoding the ampersand in HTML.

For example, in a hyperlink:

<a href="/github-btn.html?user=example&amp;repo=card&amp;type=watch&amp;count=true">…</a>

(Note that this only changes the link, not the URL. The URL is still /github-btn.html?user=example&repo=card&type=watch&count=true.)

While you may encode every & (that is part of the content) with &amp; in HTML, you are only required to encode ambiguous ampersands.

Community
  • 1
  • 1
unor
  • 92,415
  • 26
  • 211
  • 360
4

From rfc3986:

Reserved Characters
URIs include components and subcomponents that are delimited by characters in the "reserved" set. These characters are called "reserved" because they may (or may not) be defined as delimiters by the generic syntax, by each scheme-specific syntax, or by the implementation-specific syntax of a URI's dereferencing algorithm.
...

  reserved    = gen-delims / sub-delims

  gen-delims  = ":" / "/" / "?" / "#" / "[" / "]" / "@"

  sub-delims  = "!" / "$" / "&" / "'" / "(" / ")"
              / "*" / "+" / "," / ";" / "="

The purpose of reserved characters is to provide a set of delimiting characters that are distinguishable from other data within a URI. URIs that differ in the replacement of a reserved character with its corresponding percent-encoded octet are not equivalent. Percent-encoding a reserved character, or decoding a percent-encoded octet that corresponds to a reserved character, will change how the URI is interpreted by most applications.
...
URI producing applications should percent-encode data octets that correspond to characters in the reserved set unless these characters are specifically allowed by the URI scheme to represent data in that component. If a reserved character is found in a URI component and no delimiting role is known for that character, then it must be interpreted as representing the data octet corresponding to that character's encoding in US-ASCII.

So & within a URL should be encoded if it's part of the value and has no delimiting role.
Here's simple PHP code fragment using urlencode() function:

<?php
    $query_string = 'foo=' . urlencode($foo) . '&bar=' . urlencode($bar);
    echo '<a href="mycgi?' . htmlentities($query_string) . '">';
?> 
Community
  • 1
  • 1
RomanPerekhrest
  • 88,541
  • 4
  • 65
  • 105
  • 1
    Your answer does not cover rfc1866 aspects. The meaning of "delimiting" character here is not visible in your code example. The quote you show basically tells when to escape reserved characters. Example: `/?user=tester&password=123&456` the first `&` is a delimiting character, the second `&` is not. Correct would be: `/?user=tester&password=123%26456`. `htmlentities()` is something from rfc1866 "3.2.1. Data Characters": `&` is a data character (to introduce an entity). Correct in HTML would be: `/?user=tester&password=123%26456`. html entities must be encoded in ALL HTML attribute values. – Daniel W. Dec 28 '18 at 15:49