0

I am generating a webpage. The generator takes a page title given as a string and converts it to a .html filename. For example, if the title is first, the filename is first.html.

I understand that a URL path segment can only contain certain characters and that all others need to be percent encoded. A title of second post should therefore be given filename second%20post.html.

The trouble is that pages with filenames which are percent encoded are not loaded; second%20post.html is 404. Non-percent encoded filenames work fine; first.html, second+post.html, and second post.html (with a space) will load.

When second post.html is loaded, the address bar shows:

http://localhost:8000/second%20post.html

When second%20post.html is loaded, the address bar also shows:

http://localhost:8000/second%20post.html

yet gives a 404.

Why does second%20post.html not load whereas second post.html does? Could it be related to how the files are stored on disk?

index.html

<!DOCTYPE html5>
<html lang="en">
    <head>
        <link href="static/style.css" rel="stylesheet" type="text/css" />
        <link rel='shortcut icon' type="image/png" href="static/favicon.png" />
        <title>Index</title>
    </head>
    <body>
        <div id="content">
            <ul>
                <li><p class="post-title"><a href="./first.html">first</a></p></li>
                <li><p class="post-title"><a href="./second post.html">second post</a></p></li>
                <li><p class="post-title"><a href="./second%post.html">second post</a></p></li>
            </ul>
        </div>
    </body>
</html>

first.html

<!DOCTYPE html5>
<html lang="en">
    <head>
        <link href="static/style.css" rel="stylesheet" type="text/css" />
        <link rel='shortcut icon' type="image/png" href="static/favicon.png" />
        <title>First post</title>
    </head>
    <body>
        <div id="content">
            <p>First post<p>
        </div>
    </body>
</html>

'second post.html'

<!DOCTYPE html5>
<html lang="en">
    <head>
        <link href="static/style.css" rel="stylesheet" type="text/css" />
        <link rel='shortcut icon' type="image/png" href="static/favicon.png" />
        <title>Second post</title>
    </head>
    <body>
        <div id="content">
            <p>Second post, with a space<p>
        </div>
    </body>
</html>

second%20post.html

<!DOCTYPE html5>
<html lang="en">
    <head>
        <link href="static/style.css" rel="stylesheet" type="text/css" />
        <link rel='shortcut icon' type="image/png" href="static/favicon.png" />
        <title>Second post</title>
    </head>
    <body>
        <div id="content">
            <p>Second post, percent encoded<p>
        </div>
    </body>
</html>
Lorem Ipsum
  • 4,020
  • 4
  • 41
  • 67
  • White spaces in the URL are replaced by `%20` thats a normal behavior. – Always Helping Jul 25 '20 at 14:10
  • You have a typo in a third link, `second%post` instead of `second%20post`, that's why you have 404. And doctype should be just ` `. – artanik Jul 25 '20 at 14:16
  • @artanik, yes that was it. *sigh* Thank you. Should I delete this question or would it be appropriate for you to answer and me accept? – Lorem Ipsum Jul 25 '20 at 15:34
  • @LoremIpsum, I've decided to answer because it might be helpful for others – artanik Jul 25 '20 at 15:53
  • fwiw, i'm still having issues with the actual generator. I'll guess I just need to strip it down completely so that it produces precisely what's asked here. – Lorem Ipsum Jul 25 '20 at 16:05

1 Answers1

0

You have a typo in a third link, second%post instead of second%20post, that's why you have 404.

Anyway, if you want to open the exact second%20post.html file, the link should be second%2520post.html, according to the URL encoding method.

artanik
  • 2,599
  • 15
  • 24
  • Thank you, again. The issue in my real project was double encoding. The fix seems to be simply removing the encoding step and letting browsers deal with it. Or forcing a limited character set. Double encoding is described in more detail here: https://stackoverflow.com/a/16085190/5065796 – Lorem Ipsum Jul 26 '20 at 02:39