I'm using node-http-proxy
to run a proxy website. I would like to proxy any target website that the user chooses, similarly to what's done by https://www.proxysite.com/, https://www.croxyproxy.com/ or https://hide.me/en/proxy.
How would one achieve this with node-http-proxy
?
Idea #1: use a ?target=
query param.
My first naive idea was to add a query param to the proxy, so that the proxy can read it and redirect.
Code-wise, it would more or less look like (assuming we're deploy this to http://myproxy.com):
const BASE_URL = 'https://myproxy.com';
// handler is the unique handler of all routes.
async function handler(
req: NextApiRequest,
res: NextApiResponse
): Promise<void> {
try {
const url = new URL(req.url, BASE_URL); // For example: `https://myproxy.com?target=https://google.com`
const targetURLStr = url.searchParams.get('target'); // Get `?target=` query param.
return httpProxyMiddleware(req, res, {
changeOrigin: true,
target: targetURLStr,
});
} catch (err) {
res.status(500).json({ error: (err as Error).message });
}
}
Problem: If I deploy this code to myproxy.com, and load https://myproxy.com?target=https://google.com
, then google.com is loaded, but:
- if I click a link to google images, it loads
https://myproxy.com/images
instead ofhttps://myproxy.com?target=https://google.com/images
, also see URL as query param in proxy, how to navigate?
Idea #2: use cookies
Second idea is to read the ?target=
query param like above, store its hostname in a cookie, and proxy all resources to the cookie's hostname.
So for example user wants to access https://google.com/a/b?c=d via the proxy. The flow is:
- go to
https://myproxy.com?target=${encodeURIComponent('https://google.com/a/b?c=d')}
- proxy reads the
?target=
query param, sets the hostname (https://google.com
) in a cookie - proxy redirects to https://myproxy.com/a/b?c=d (307 redirect)
- proxy sees a new request, and since the cookie is set, we proxy this request into
node-http-proxy
using cookie's target.
Code-wise, it would look like: https://gist.github.com/throwaway34241/de8a623c1925ce0acd9d75ff10746275
Problem: This works very well. But only for one proxy at a time. If I open one browser tab with https://myproxy.com?target=https://google.com
, and another tab with https://myproxy.com?target=https://facebook.com
, then:
- first it'll set the cookie to https://google.com, and i can navigate in the 1st tab correctly
- then I go to the 2nd tab (without closing the 1st one), it'll set the cookie to https://facebook.com, and I can navigate facebook on the 2nd tab correctly
- but then if I go back to the first tab, it'll proxy google resources through facebook, because the cookie has been overwritten.
I'm a bit out of ideas, and am wondering how those generic proxy websites are doing. Ideally, I would not want to parse the HTML of the target website.