5

I'm trying to implement a simple proxy within a .NET Core REST service, so I can inject additional authentication headers, and then return it to any client like a normal website.

In a simplified form it looks like this:

[HttpGet]
public async Task<ContentResult> Get()
{
    HttpRequestMessage request = new HttpRequestMessage(HttpMethod.Get, "http://google.com");

    /* some extra headers injection happens here */

    var response = await client.SendAsync(request);
    response.EnsureSuccessStatusCode();
    var result = await response.Content.ReadAsStringAsync();
    return Content(result, "text/html", Encoding.UTF8);
}

The problem is that while the response is correctly rendered by any browser as the original HTML page, any script or link (any relative URL) inclusion in the returned page fails.

What is missing in the code above to make browsers resolve inner relative URL-s correctly?

In the above example, if I run it, I get google.com page displayed from my https://localhost:44307/api/test, except images and other stuff from relative URL-s is missing, as they fail to resolve inner relative URL-s.

In a confusion, I tried to play with such properties as Referer and Host within request and response, but didn't make any progress.


Where it is needed. We need to use a third-party website via IFRAME, and that website requires Authorization header present, so the proxy above is supposed to do just that, and then return the website, so the API link can be used directly, like this: <iframe src="https://localhost:44307/api/test"> - this example should render complete google.com website inside the iframe, but it renders HTML only.

vitaly-t
  • 24,279
  • 15
  • 116
  • 138
  • 1
    There's no code there that would proxy the related resouces too? the only resource this can proxy is google.com as hard coded, so any relitive resouces would just resolve to a 404 on your server. Further to that, changing the path of the proxied page adds complexity ( this will append /api/test to the start ). The simple soulution would be to serve your proxy code from the root of your proxy app, and allow it to also proxy additonal resource requests. You could try adjusting URI's in the HTML / CSS, but that's going to be alot of work to get right ... – Sam Feb 14 '19 at 17:56
  • @Sam I've just added some clarification and example of where it is needed. Would it make any difference? – vitaly-t Feb 14 '19 at 17:59
  • That's a weird scenario - The third party doesn't support SSO via a proper mechanism? Writing a decent proxy is alot of work - You can wite a simple one that has the apperance of working very easily but you'll be missing alot of stuff ... Look at some existing C# HTTP proxy implementations and you'll get an idea of the scale, for example https://github.com/justcoding121/Titanium-Web-Proxy that I found via google. If you must implement one, I'd suggest forking an existing one and adding your header code to that. – Sam Feb 14 '19 at 19:51
  • Have you tried YARP? https://microsoft.github.io/reverse-proxy/index.html – Kerem Demirer Apr 21 '23 at 00:05

1 Answers1

0

A ton of websites out there use relative paths to grab their resources (scripts/links/images/etc.) because it is convenient and allows them to have different environments in which things work. For example, having a development server, staging server, and a production server requires that each one be able to load the appropriate content. With that being said, there are a couple of options for you but they will require you to parse there content:

  1. You can replace all of their references to internal sources with links to your proxy so that your headers get added for each of the resources.

  2. You can replace all of their relative paths with absolute paths to the original domain so that all resource requests bypass your proxy. There are a few issues that can come up with this depending on their security.

As some have mentioned, neither of these solutions will make it easy to have a robust solution and will require parsing the CSS and JavaScript as well for relative paths. Not exactly an easy task, unfortunately, but probably far easier than trying to use some kind of virtualization.

To replace the content you can use something like HTMLAgilityPack. I've used it on a few projects and it works great and has a pretty good community.

This gentleman has posted an example of how to do something very similar HERE.

Kyle Goode
  • 1,108
  • 11
  • 15
  • Major issue with this, the third-party website that we are trying to proxy includes relative-url scripts that in turn contain more dynamic relative URL-s generated, and those ones cannot be replaced. Apparently, all links are generated based on the URL of the website requesting it, somehow, i'm not even sure, but nested relative URL-s get broken, unfortunately. I was hoping there would be a way to replace the requesting GET source with some Referer or else, to make the website produce correct URL-s. – vitaly-t Feb 14 '19 at 19:11
  • You'll also need to do paths in css, and any dynamic things in Javascript too, You'd never come up with a robust solution by replacing things in reality... – Sam Feb 14 '19 at 19:31
  • @vitaly-t Unfortunately, there isn't a standard way of doing that. The techniques I've mentioned are effectively what most web proxies are forced to do. If you look through various threads you'll see lots of examples of people doing something similar. For example: https://stackoverflow.com/questions/26394924/make-a-web-proxy-step-by-step – Kyle Goode Feb 14 '19 at 20:17