Fetching a file on a server, resizing with PHP GD2, security considerations

Question

What are the security considerations when a server fetches a file from an untrusted domain?

What are the security considerations when resizing an image that you don't trust with PHPs GD2 library?

The file will be stored on the server machine, and will be offered for download. I know I can't trust the MIME-Type header. Is there anything else I should be aware of?

I have a webservice that looks like this:

input

An http-URL (or a String that is expected to be a URL)

output

A meta description of the file, or an error if there was one.

The meta description has one of two forms:

It's an image + a URL to the image on my domain + a thumbnail of the image (generated on and hosted by my server)
It's not an image + a URL to the file on my domain

update

Concerns that I can come up with:

The remote server is a malicious server that will send tiny bits of information, enough to keep the socket open, but doesn't do anything useful - like slowloris. I don't know how real of a threat this is. I suppose it could be easily avoided with timeout + progress check.
The remote server serves something that looks like an image (headers, mime-type) but causes PHP to crash when I load it with GD2.
The server sends a useless or bad MIME-type header. Like text-plain for binary files.
The remote server serves an image with a virus in it. I assume that resizing the image will get rid of the virus, but I will serve the original image if there is no reason to scale.
The remote server serves a file with a virus in it. The file will not be treated as an image so my server will do nothing with it. Nothing will happen until the user downloads, and runs it.

Also, I assume I can trust the users of my service. This is a private application in a situation where users can be held accountable for bad behavior. I assume they wont intentionally try to break it.

can you tell me more about what type security issues you are referring to? because as far as downloading the file and mirroring it goes, you can force download the files hosted on your server to prevent them from running at all so there wont be any issue apart from the possibility hosting a virus and getting blocked by google. as far as images go, resizing them should be fine and as far as I am concerned if the file isnt a valid image it will not be resized so you shouldnt have any issues. — Ahoura Ghotbi, Jan 04 '12 at 12:36
I don't intend to _run_ the files on my server. This should not happen unless there is a bug in a PHP function (which is a valid concern). If the file contains a virus, there is little I can do to protect the user. I want a guarantee that my server is safe. — Halcyon, Jan 04 '12 at 13:40

score 2 · Accepted Answer · edited May 23 '17 at 10:34

What are the security considerations when a server fetches a file from an untrusted domain?

The domain (host) and the file is not to be trusted. This spreads over two points:

Transport
Data

To transport the data safely, use a timeout and a size limit. Modern HTTP client libraries offer both of that. If the file could not be requested in time, drop the connection. If the file is too large, drop the data. Tell the user that there was a problem getting the file. Alternatively let the user handle the transport to that server by using the users browser and javascript to obtain the file. Then post it. Set the post limit with your script.

As long as the data is untrusted you need to handle it with caution. That means, you implement yourself a process that is able to run different security checks on the file before you mark it as "safe".

What are the security considerations when resizing an image that you don't trust with PHPs GD2 library?

Do not pass untrusted data to the image library then. See the step above, bring it into a safe state first.

The file will be stored on the server machine, and will be offered for download. I know I can't trust the MIME-Type header. Is there anything else I should be aware of?

I think you're still at the point above. How to come to safe from untrusted. Sure you can't trust the Content-Type header, however it's good to understand it as well.

You want to protect against the Unrestricted File Upload Vulnerability^OWASP.

Check the filename. If you store the data on your server, give it a safe temporary name that can not be guessed upfront and that is not accessible via the web.
Check the data associated with the filename, e.g. the URL information of the source of that file. Properly handle encoding.
Drop anything that does not meet your expectations, so check the pre-conditions you formulate strictly.
Validate the file data before you continue, for example by using a virus checker.
Validate the image data before you continue. This includes file-headers (magic numbers) as well as that the file-size and file-content is valid. You should use a library that has specialized for the job, e.g. an image-file-format-malformation-checker. This is specialized software, so if this part of your business get into business. Many free software image file code exists, I leave this just for the info, you can't trust any recommendation anyway and need to get into the topic.
If you plan to resize the image yourself, you need to make everything double-safe, because next to hosting you plan to process the data. So know what you do with the data first to locate potential fields of problems.
Do logging and monitoring.
Have a plan for the case that everything get's wrong.
Consider to repeat the process for already existing files, so if you change your procedure, you are able to automatically apply the principles to uploads that were done in the past as well.
Create a system for each type of work that is able to be cleaned after the work has been done. One system to do the download, one system to obtain the meta data etc.. After each action, restore the system from an image. If a single components fails, it won't be left over in an exploited state. Additionally if you detect a fail, you can take your whole system out of business until you have found the flaw.

All this depends a bit how much you want to do, but I think you get the idea. Create a process that works for you knowing where improvement can be added, but first create an infrastructure that is modular enough to deal with error-cases and which probably encapsulates the process enough to deal with any outcome.

You could delegate critical parts to a system that you don't need to care about, e.g. to separate processing from hosting. Additionally, when you host the images the webserver must not be clever. The more stupid a system is, the less exploitable it is (normally).

If hosting is not part of your business, why not hand it over to amazon s3 or similar stores? Your domain can be preserved via DNS settings.

Keep the libraries you use to verify images with up-to-date (which implicates you know which libraries are used and their versio, e.g. the PHP exif extension is making use of mbstring etc. pp. - track the whole tree down). Take care you're in the position to report flaws to the library maintainers in a useful way, e.g. with logging, storing upload data to reproduce stuff etc..

Get knowledge about which exploits for images did exist in the past and which systems/components/libraries (example, see disclaimer there) were affected.

Also get into the topic which are common ways to exploit something, to get the basics together (I'm sure you are aware, however it's always good to re-read some stuff):

Secure file upload in PHP web applications (Alla Bezroutchko; June 13, 2007; PDF)

Some related questions, assorted:

score 1 · Answer 2 · edited May 23 '17 at 12:02

What you're describing basically comes down to an input validation problem; you don't trust what your application is reading in as input and processing.

To address this, what you should do is to download the resource in question and then attempt to determine a true file type. There are multiple ways to attempt this, but basically you will want to use either some custom-code or a library to parse through the file and look for the tell-tail signs of a certain type. There is a good SO discussion on how to do this in PHP here - How can I determine a file's true extension/type programatically? - I would check the second answer that lists some PHP-specific functions to do this. When your application receives a file, it should perform some true file typing like this and then compare the result to what the specified MIME type from the remote server is; if they match accept the file and if they do not, drop it.

I would also suggest using a whitelist of allowable filetypes (a list of everything your service will support and then ONLY accept files of those types). If you have a very general-purpose service, then you should at least do a blacklist of disallowed filetypes (a list of everything your service absolutely will not support and drop those immediately based on the outcome of your MIME type compares). Again, the use of these is entirely dependent on your use-cases.

Once you've got a type, the concern becomes if what the remote server has sent you is a bad file that targets your server (contains malicious code, buffer overflow designed to make the GD2 library blow up and run arbitrary code, etc). Basically, you are relying on the GD2 library to not contain bugs that would lead to such a successful exploit. There's not much you can do here, short of running security audit on the library yourself and I'm going to assume that's out-of-scope. Basically, keep up on any reported security bugs with the library and patch as soon as you can; as a consumer of the library, you are really relying on the maintainers to find and remedy security vulnerabilities like this.

Next, the concern is that the remote server has sent you a bad file that targets your users/clients (contains malicious code, buffer overflows, viruses, etc). Here, if there is corrupted data that is really malware in the image, it will most likely either (1) break or exploit GD2 when it is read (see above for that scenario) or (2) be eliminated when the resize operation is performed by the library if GD2 can successfully process it. There is still a chance it will remain despite the processing, but there's not much you can do there either. If you're really concerned about this, you can apply a virusscan using an external product designed for that; I would suggest that if you're doing that to do so both (1) after the download and before GD2 processing and then (2) on the manipulated file before you serve it out. Personally, I don't think you get much by doing this, but if you want to provide an additional check / warm fuzzies to your users, it cannot hurt.

To address the slow-feeding of data to keep a connection open, put a timeout on any connection to deal with this problem; unless you are dealing with a specific threat to your use-case here, I do not think this is a huge concern.

The PHP functions seem to use a magic number approach as well for detecting the actual MIME-Type. It's a lot better than extension sniffing but it gives no guarantees. I can't use a whitelist because it's likely that some proprietary file-type will be used that I don't know about. Is it absolutely required to have a blacklist? I can see some issues with HTML files (containing session stealers). Could that be bypassed somehow? — Halcyon, Jan 04 '12 at 16:11
Most true-filing typing does use magic numbers and/or header structure validation. There really isn't a better/more-robust way to do it; this is why I suggested comparing what you get there to the supplied MIME type (sanity check). At that point, you're really relying on the decoding library (GD2 for images, HTML parser for HTML files, etc) to not break/be-exploited on a malformed file. As for whitelist/blacklist, you can also choose to sanitize as oppose to accept/reject; HTML-encode all HTML files (similar for others) using a tested/reliable library like from OWASP to make them a lot safer. — , Jan 04 '12 at 16:16

score 0 · Answer 3 · answered Jan 04 '12 at 12:50

0

1) My primary concern with blindly fetching a file from an untrusted domain would be how to verify that the file is, in fact, what you expected to get.; could the untrusted server trick your script into downloading a harmful file (like a virus) or possibly a script that would allow a backdoor into your system?

2) I haven't read any security issues with resizing an image with the GD2 library. If it's not an image to begin with, the GD2 functions would throw an error. I don't think you have much to worry about with this part.

3) I (personally) would not ever do this without reviewing every single file that my script downloaded first. If you want to partially automate this, you might consider running magic number tests on all the files as a pre-filter. But a human look is the safest way to serve random files. When you finish this project - before you make it live - try to break / trick / hack it as hard as you can. Get some knowledgeable friends involved to help.

answered Jan 04 '12 at 12:50

WWW

9,734
1
29
33

1. I assume this can only happen if I _run_ the file. 3. The magic number check is a sensible check but it's no guarantee. If there is real malicious intent it's very easy to circumvent. – Halcyon Jan 04 '12 at 13:42
1) If you're serving the file to others then your server could run it. 3) That's kind of my point: without some kind of human review by *someone* trusted, there *is* no guarantee. I know you said you have a default trust of your users, but it only takes getting burned once to really hurt you. – WWW Jan 04 '12 at 13:53
1) I think I'm just using `readfile` or maybe `echo file_get_contents()`. How is that _running_ the file? 3) My only concern is the server - I don't strictly care that a file contains a virus that might mess with the users PC. PICNIC errors are out of scope ;) – Halcyon Jan 04 '12 at 14:03
1) It's not - but you didn't originally say that's how you were doing it. =) I assumed you were leaving it to the web server to serve the file on its own rather than having a script display the raw contents. 3) If you're reading the files the way you just explained AND storing the files that you scrape in a non-web-accessible path, then you should be pretty safe. – WWW Jan 04 '12 at 14:15

score -1 · Answer 4 · answered Jan 04 '12 at 12:35

-1

when it is not an image you store the file any way regardless what kind of file? so they can upload and php file and browse to it to execute php code on your server?

answered Jan 04 '12 at 12:35

Iggy Van Der Wielen

124
6

this is not a section to post questions, you need to either post answers here or use the comment section of the question – Ahoura Ghotbi Jan 04 '12 at 12:37
where do i comment on the first question? – Iggy Van Der Wielen Jan 04 '12 at 12:46
where I commented, but I think you dont have enough reputation therefore you are not allowed to comment yet. – Ahoura Ghotbi Jan 04 '12 at 13:33
Yes, I will accept any file. Of course I could choose to blacklist known _problematic_ file types such as .js .php .exe but ideally I wouldn't have to do this. This is only possible of course because I _trust_ the users of the system not to have malicious intent. – Halcyon Jan 04 '12 at 13:38
so if i want to answer the question but need more info regarding to you it is not possible? – Iggy Van Der Wielen Jan 04 '12 at 13:57

Fetching a file on a server, resizing with PHP GD2, security considerations

4 Answers4

Linked