I work for a webhost and my job is to find and cleanup hacked accounts. The way I find a good 90% of shells\malware\injections is to look for files that are "out of place." For example, eval(base64_decode(.......))
, where ".....
" is a whole bunch of base64'ed text that is usually never good. Odd looking files jump out at me as I grep through files for key strings.
If these files jump out at me as a human I'm sure I can build some kind of profiler in python to look for things that are "out of place" statistically and flag them for manual review. To start off I thought I can compare the length of lines in php files containing key strings (eval
, base64_decode
, exec
, gunzip
, gzinflate
, fwrite
, preg_replace
, etc.) and look for lines that deviate from the average by 2 standard deviations.
The line length varies widely and I'm not sure if this would be a good statistic to use. Another approach would be to assign weighted rules to cretin things (line length over or under threshold = X points, contains the word upload = Y points) but I'm not sure what I can actually do with the scores or how to score the each attribute. My statistics is a little rusty.
Could anyone point me in the right direction (guides, tutorials, libraries) for statistical profiling?