I do scraping on a site which has similar html
<a href="/pages/1></a>
I also have the window.location
object, where I have
origin:"http://www.example.org"
so I can build the absolute path like origin + href
= http://www.example.org/pages/1
I made a mockup of the page on my file system for testing.
-www.example.org
|-2017
|-pages
|-1.html
|-2.html
|-2016
|-pages
|-1.html
|-2.html
in those html files the links look something like this:
<!-- www.example.org/2016/pages/1.html -->
<a href="../../2017/pages/1.html">2017</a>
In the test the same code won't work, because the window.location object's origin is file://
:
hash:""
host:""
hostname:""
href:"file:///home/me/projects/fp/src/test/fixtures/www.example.org/2016/pages/1.html"
origin:"file://"
pathname:"/home/me/projects/fp/src/test/fixtures/www.example.org/2016/pages/1.html"
port:""
protocol:"file:"
which produces origin + href
= file://../../2017/pages/1.html . With some string manipulation I could make file:///home/me/projects/fp/src/test/fixtures/www.example.org/2017/pages/1.html
from location.pathname
if the protocol is file:
. But is it the right way to handle this problem?