In our company, we have to deal with a lot of user uploads, for example images and videos. Now I was wondering: how do you guys "deal with that" in terms of safety? Is it possible for an image to contain malicious content? Of course, there are the "unwanted" pixels, like porn or something. But that's not what I mean now. I mean images which "break" machines while they are being decoded, etc. I already saw this: How can a virus exist in an image.
Basically I was planning to do this:
- Create a DMZ
- Store the assets in a bucket (we use GCP here) which lives inside the DMZ
- Then apply "malicious code"-detection on the file
- If it turns out to be fine... then move the asset into the "real" landscape (the non-dmz)
Now the 3rd part... what can I do here?
Applying a virus scanner No problem with this, there are a lot of options here. Simple approach and good chance that viruses are being caught.
Do mime-type detection Based on the first few bytes, I do a mime type detection. For example, if someone sends us a "image.jpg" but in fact its an executable, then we would detect this. Right? Is this safe enough? I was thinking about this package.
What else??? Now... what else can I do? How do other big companies do this? I'm not really looking for answers in terms of orchestration, etc. I know how to use a DMZ, link it all together with a few pubsub topics, etc. I'm purely interested in what techniques to apply to really find out that an incoming asset is "safe".