3

I'm using node-http-proxy to run a proxy website. I would like to proxy any target website that the user chooses, similarly to what's done by https://www.proxysite.com/, https://www.croxyproxy.com/ or https://hide.me/en/proxy.

How would one achieve this with node-http-proxy?

Idea #1: use a ?target= query param.

My first naive idea was to add a query param to the proxy, so that the proxy can read it and redirect.

Code-wise, it would more or less look like (assuming we're deploy this to http://myproxy.com):

const BASE_URL = 'https://myproxy.com';

// handler is the unique handler of all routes.
async function handler(
    req: NextApiRequest,
    res: NextApiResponse
): Promise<void> {
    try {
        const url = new URL(req.url, BASE_URL); // For example: `https://myproxy.com?target=https://google.com`
        const targetURLStr = url.searchParams.get('target'); // Get `?target=` query param.

        return httpProxyMiddleware(req, res, {
            changeOrigin: true,
            target: targetURLStr,
        });
    } catch (err) {
        res.status(500).json({ error: (err as Error).message });
    }
}

Problem: If I deploy this code to myproxy.com, and load https://myproxy.com?target=https://google.com, then google.com is loaded, but:

Idea #2: use cookies

Second idea is to read the ?target= query param like above, store its hostname in a cookie, and proxy all resources to the cookie's hostname.

So for example user wants to access https://google.com/a/b?c=d via the proxy. The flow is:

  • go to https://myproxy.com?target=${encodeURIComponent('https://google.com/a/b?c=d')}
  • proxy reads the ?target= query param, sets the hostname (https://google.com) in a cookie
  • proxy redirects to https://myproxy.com/a/b?c=d (307 redirect)
  • proxy sees a new request, and since the cookie is set, we proxy this request into node-http-proxy using cookie's target.

Code-wise, it would look like: https://gist.github.com/throwaway34241/de8a623c1925ce0acd9d75ff10746275

Problem: This works very well. But only for one proxy at a time. If I open one browser tab with https://myproxy.com?target=https://google.com, and another tab with https://myproxy.com?target=https://facebook.com, then:

  • first it'll set the cookie to https://google.com, and i can navigate in the 1st tab correctly
  • then I go to the 2nd tab (without closing the 1st one), it'll set the cookie to https://facebook.com, and I can navigate facebook on the 2nd tab correctly
  • but then if I go back to the first tab, it'll proxy google resources through facebook, because the cookie has been overwritten.

I'm a bit out of ideas, and am wondering how those generic proxy websites are doing. Ideally, I would not want to parse the HTML of the target website.

jeanpaul62
  • 9,451
  • 13
  • 54
  • 94

1 Answers1

0

The idea of a Proxy is to intercept the client requests, either by ports or by backend APIs, extract the URLs of requested resources, modify them and make those requests by self from servers, and modify responses and send them back to the client.

your first approach does this except modify responses and send back modified responses.

one way to do this is to edit all links in resources return by proxy to have your web address in them, only then send them as responses back to the client.

another way is to wrap the target site in a frame, as most web proxy sites do, and have a script to crawl the page and replace all links.

there is a small problem though. javascript-based requests are mostly hardcoded in the script and it is not an easy job to replace them.

your seconds approach sounds as if it would work better, but just a sound, nothing concrete I can say. implement a tab activity checker so you can change the cookie to your active tab. please check how-to-tell-if-browser-tab-is-active discussion about that

Yılmaz Durmaz
  • 2,374
  • 12
  • 26