5

Is it possible to produce compilers that heuristically check for malware behaviour? If it is possible why has not it been implemented? Wouldn't that strongly help preventing the production of such viruses, I mean why wait to stop them once they are out there?

Even if these people use a compiler that does not use the "proposed" built in AV, personal AV could detect that and grade the file as risky (sort of like SSL Certificates)

Carlos
  • 5,405
  • 21
  • 68
  • 114

3 Answers3

11

You're making a lot of assumptions:

  • That the virus writers couldn't disable the built-in AV of any open-source (or even closed-source) compilers. Given how DRM is consistently and quickly broken, this seems unlikely.
  • That the virus writers couldn't simply use an existing pre-AV compiler.
  • That the virus writers couldn't create their own non-AV compiler.
  • That there are no legitimate programs that would trigger the compiler's AV heuristics.
  • That today's compiler writers can accurately predict and model all current and future AV behavior in order to produce a heuristic that is even remotely effective.

Seems to me like it's a non-starter.

Your comment about using non-AV compilers is essentially "code signing", and has been a common practice for years (decades?). The barrier there, however, is distribution of certificates, and coming up with a reasonable list of trusted signers. They're big enough problems that noone's found a way to solve them yet without severely limiting the usefulness of computers.

For even more information closely related to this subject, see this paper by Ken Thompson.

Mark
  • 11,257
  • 11
  • 61
  • 97
  • 3
    Fair points buddy. With the first 4 points - thats why i said - if the final "program" is not scanned then they dont get some sort of encrypted "pass certificate". If that file is released "unchecked" then the certificate will be missing or inaccurate, which could then be flagged `risky` by another AV on the receiving end, resulting in thorough-er scanning. – Carlos Nov 16 '10 at 21:56
  • 3
    Even if you solved the certificate-distribution problem and the who-to-trust problem, you'd still be left with the problem of virus writers disabling the check-for-viruses part of the compiler but not the mark-as-clean part. – Mark Nov 16 '10 at 22:03
  • yeah that's actually another good point, thanks for the constructive criticism though :) – Carlos Nov 16 '10 at 22:06
  • 1
    One possible workaround for much of the preceding concerns is for the compiler to run it's analyses of the code and then embed into the executable a machine readable proof that the code fits some criteria. No certs needed, the user-AV just checks the proof. This would work for a wide variety of attributes that a program can claim to exhibit. If a standard set of attributes became prevalent, malware would generally be conspicuous by ommition. – BCS Nov 18 '10 at 18:38
  • How would you prevent people from writing a compiler (or altering an existing one) that embeds the required proof without running any analysis? – Mark Nov 18 '10 at 18:52
  • @Mark, no need. It would be virtually impossible to generate a valid proof without doing the analysis. BTW, the proof isn't that the original source has some attribute, but that the exact representation of it in the binary does. In effect, the binary it self is part of the proof. – BCS Nov 18 '10 at 22:07
  • Why would it be impossible to generate a valid proof without doing an analysis? Why not analyze notepad.exe and embed that proof into your virus? – Mark Nov 18 '10 at 22:20
  • If you included a proof of correctness for notepad, the AV's proof checker would very quickly detect that something is wrong when the proof starts referring to parts of the binary that don't do what it says they do. For example, the proof would make assertions like: "if condition X is true when code starts executing the basic block at location 0x95872398, then the branch at the end at 0x95872583 will always be followed". The proof checker would start by examining the binary and verifying that the ASM starting at location 0x95872398 and running to 0x95872583 is in fact a basic block. – BCS Nov 19 '10 at 00:45
  • The premise of the system is that there is a wide variety of trivial properties that can be easily verified about binaries and a similar set of identities that can be used to construct complex properties from those trivial bits. The proof ends up being a list the trivial properties that the proof asserts and the proof checker checks and a sequence of steps that the checker applies to the assertions. When all is said an done, if some specific pattern is constructed, then you know that the code you are about to run exhibits some specific property. – BCS Nov 19 '10 at 00:51
  • So why hasn't this PCC principal been put into practice? Everything I can find on it was written by the originators of the idea. – Mark Nov 19 '10 at 15:03
  • Generating proofs of correctness are HARD. As a single data point: I ran across one project that did a proof of correctness of a single 10 KLOC program and it took something like 10 man years. In theory it works and is even secure. In practice, it's actually cheaper (way, way cheaper) to rebuild your system every 6-12 months. – BCS Nov 22 '10 at 02:19
3
  • Existing AV generally works with a black-list approach. (Comparing threat signatures against files.) That would be, by definition, almost useless on an entirely new threat.

  • Every operation you could try to classify would end up blocking a legitimate program; if the operations didn't have a legitimate use, the OS designers would remove them for safety reasons.

jdmichal
  • 10,984
  • 4
  • 43
  • 42
  • I am aware on your first point, as am developing a genetic algorithm AV at the moment for my undergrad project and that is one of the reasons why i decided to do it (static signature comparison being the main method used today for detection which IMO is rather poor). Yet there is heuristics. – Carlos Nov 16 '10 at 22:51
3

There is the classical paper "Reflections on Trusting Trust" by Ken Thompson.

Alexander Gorshenev
  • 2,769
  • 18
  • 33