Trying to iterate operations on a file with awk and sed

Question

I've got a line that pulls out the number of times the word severity comes out after the word vulnerabilities in a file

please don't laugh too hard:

cat <file> | sed '1,/vulnerabilities/d' | grep -c '"severity": 4'

This will come back with a count of "severity" : 4 matches in the file. I can't seem to iterate this amongst other files.

I have 100 or so files in the form bleeblah-082017. Where bleeblah can be different lengths and words. I'm having issues on how to easily iterate from one file above to get results from each individually.

I would usually have used an awk line to iterate through the list, but I can't seem to find any examples to meld awk and sed.

Would anyone have any ideas on how to perform the task above over many files and return a results per file?

Thanks Davey

I have a file that has a bunch of entries such as:
{ "count": 6,
"plugin_family": "Misc.", "plugin_id": 7467253, "plugin_name": "Blah", "severity": 4, "severity_index": 1, "vuln_index": 13

I'd like to extract the times "severity": 4 appears after the word vulnerabilities in each file. The output would be 10

Some more of the input file.

  "notes": null,
    "remediations": {
        "num_cves": 20,
        "num_hosts": 6,
        "num_impacted_hosts": 2,
        "num_remediated_cves": 6,
        "remediations": [
            {
                "hosts": 2,
                "remediation": "Apache HTTP Server httpOnly Cookie Information Disclosure: Upgrade to Apache version 2.0.65 / 2.2.22 or later.",
                "value": "f950f3ddf554d7ea2bda868d54e2b639",
                "vulns": 4
            },
            {
                "hosts": 2,
                "remediation": "Oracle Application Express (Apex) CVE-2012-1708: Upgrade Application Express to at least version 4.1.1.",
                "value": "2c07a93fee3b201a9c380e59fa102ccc",
                "vulns": 2
            }
        ]
    },
    "vulnerabilities": [
        {
            "count": 6,
            "plugin_family": "Misc.",
            "plugin_id": 71049,
            "plugin_name": "SSH Weak MAC Algorithms Enabled",
            "severity": 1,
            "severity_index": 0,
            "vuln_index": 15
        },
        {
            "count": 6,
            "plugin_family": "Misc.",
            "plugin_id": 70658,
            "plugin_name": "SSH Server CBC Mode Ciphers Enabled",
            "severity": 1,
            "severity_index": 1,
            "vuln_index": 13
        },
        {
            "count": 2,
            "plugin_family": "Web Servers",
            "plugin_id": 64713,
            "plugin_name": "Oracle Application Express (Apex) CVE-2012-1708",
            "severity": 2,
            "severity_index": 2,
            "vuln_index": 12
        },

Each of these files are from vulnerability scans that have been extracted from my scanner API. Essentially the word severity is all over the place in the different aspects (hosts, vulns, etc). I want to extract from each scan file the number of times the pattern appears after the word vulnerability (which only appears once in each file). Open to using perl python whatever to acheive this. Was just more familiar with shell scripting to manipulate these text type files in the past.

It really isn't entirely clear to me what you're trying to achieve as your first statement is a little ambiguous. Can I suggest you include a (small) sample input file and the output you are expecting? — Component 10, Aug 16 '17 at 19:43
I don't understand your question too. The line is JSON. You should better use Perl or Python to parse (and maybe filter it) it. At least a oneliner to make it multiline. You talking about "iterating through a list" - does `` contain these JSON object one in each line? And: "meld awk and sed" you could do any sed operations with `awk` alone. Show an example `` containing the word "vulnerabilities". — try-catch-finally, Aug 16 '17 at 20:00

score 1 · Answer 1 · answered Aug 16 '17 at 23:44

1

Parsing .json data with sed or awk is fraught with potential pitfalls. I recommend using a format-aware tool like jq to query the data you want. In this case, you can do something like

jq '{(input_filename): [.vulnerabilities[].severity]|add}' *.json

This should produce output something like

{
  "bleeblah-201708.json": 4
}
{
  "bleeblah-201709.json": 11
}

answered Aug 16 '17 at 23:44

Gumnos

403
3
7

Oh wait, it is not correct. Check the question again. I was already wondering – hek2mgl Aug 17 '17 at 00:04
I took the sample JSON you provided as "some more of the input file", made it valid JSON (as you only provided an excerpt, it was invalid on its own), and the above query does a per-file tally of the `severity` values across the `vulnerabilities` sections. You request tallying where "severity:4" but your sample data doesn't include any values with `severity:4`. With proper sample input data and proper output data, the `jq` command can be tweaked to query just those where `severity:4`matches. – Gumnos Aug 24 '17 at 00:46

hek2mgl · Answer 2 · 2017-08-16T23:50:02.997

Use jq for parsing json on the command line. It is the standard tool. Working with text based tools like sed to parse json is very fragile since it relies on the order of elements and formatting of the json documents which is not guaranteed or part of the the json standard.

What you are looking for is the following command:

jq '[.vulnerabilities[]|select(.severity==4)]|length' file.json

If you want to run it for multiple files, use find:

find FOLDER -name 'PATTERN.json' -print \
    -exec jq '[.vulnerabilities[]|select(.severity==4)]|length' {} +

score -1 · Accepted Answer · answered Aug 16 '17 at 20:55

I have made the following two example files, assuming that they can represent what you have. Note the occurrence of the search text before "vulnerabilities" and after, with different number of occurrences after. From your code I assume that the search string will only be at most once on a line, the lines will be counted.

blableh-082017:

 "severity" : 4 
  "severity" : 4 

 vulnerabilities

 "severity" : 4 
  "severity" : 4

bleeblah-082017:

 "severity" : 4 
  "severity" : 4 
 vulnerabilities

 "severity" : 4 
  "severity" : 4 
   "severity" : 4

Here is my proposal, using find in addition to sed and grep, also using sh to achieve the desired piping inside -exec.

find . -iname "*-082017"  -print -exec sh -c "sed  1,/vulnerabilities/d {} | grep -c '\"severity\" : 4'" \;

Output (hoping a name line and a count line are OK, otherwise another sed coudl reformat for you):

./blableh-082017
2
./bleeblah-082017
3

Details:

use find to process multiple files and get each file name to the output,
inspite of seds lack of support for that
use basically your code to do the cutting via sed and the counting via grep
give filename to sed as parameter, instead via pipe from cat
use sh within -exec to achieve piping
(answer by devnull to How to use pipe within -exec in find)

Environment:

GNU sed version 4.2.1
GNU bash, version 3.1.23(1)-release (i686-pc-msys)
GNU grep 2.5.4
find (GNU findutils) 4.4.2

Trying to iterate operations on a file with awk and sed

3 Answers3