What no-one seems to realize is that none of the System.Uri
constructors correctly handles certain paths with percent signs in them.
new Uri(@"C:\%51.txt").AbsoluteUri;
This gives you "file:///C:/Q.txt"
instead of "file:///C:/%2551.txt"
.
Neither values of the deprecated dontEscape argument makes any difference, and specifying the UriKind gives the same result too. Trying with the UriBuilder doesn't help either:
new UriBuilder() { Scheme = Uri.UriSchemeFile, Host = "", Path = @"C:\%51.txt" }.Uri.AbsoluteUri
This returns "file:///C:/Q.txt"
as well.
As far as I can tell the framework is actually lacking any way of doing this correctly.
We can try to it by replacing the backslashes with forward slashes and feed the path to Uri.EscapeUriString
- i.e.
new Uri(Uri.EscapeUriString(filePath.Replace(Path.DirectorySeparatorChar, '/'))).AbsoluteUri
This seems to work at first, but if you give it the path C:\a b.txt
then you end up with file:///C:/a%2520b.txt
instead of file:///C:/a%20b.txt
- somehow it decides that some sequences should be decoded but not others. Now we could just prefix with "file:///"
ourselves, however this fails to take UNC paths like \\remote\share\foo.txt
into account - what seems to be generally accepted on Windows is to turn them into pseudo-urls of the form file://remote/share/foo.txt
, so we should take that into account as well.
EscapeUriString
also has the problem that it does not escape the '#'
character. It would seem at this point that we have no other choice but making our own method from scratch. So this is what I suggest:
public static string FilePathToFileUrl(string filePath)
{
StringBuilder uri = new StringBuilder();
foreach (char v in filePath)
{
if ((v >= 'a' && v <= 'z') || (v >= 'A' && v <= 'Z') || (v >= '0' && v <= '9') ||
v == '+' || v == '/' || v == ':' || v == '.' || v == '-' || v == '_' || v == '~' ||
v > '\xFF')
{
uri.Append(v);
}
else if (v == Path.DirectorySeparatorChar || v == Path.AltDirectorySeparatorChar)
{
uri.Append('/');
}
else
{
uri.Append(String.Format("%{0:X2}", (int)v));
}
}
if (uri.Length >= 2 && uri[0] == '/' && uri[1] == '/') // UNC path
uri.Insert(0, "file:");
else
uri.Insert(0, "file:///");
return uri.ToString();
}
This intentionally leaves + and : unencoded as that seems to be how it's usually done on Windows. It also only encodes latin1 as Internet Explorer can't understand unicode characters in file urls if they are encoded.