Python csv cutting off parts of column

Question

I am running into this weird issue.

I Should also mentioned this worked in the past, so I am also thinking maybe something wrong with the .csv or the specific line itself.

A quick break down. I have a script that pulls data from a .csv file of CVE (vulnerability) data. It then uses the cvss module to rescore the findings where we use the output as a way to measure priority of patching and urgency.

(this script is a temporary fix until we implement new tooling)

Here is where it messes up. Here is what my ingest file output looks like right now.

Vulnerability Title,Plugin ID,Original CVSS Score,Default Vector,Original Severity,AWS Score,AWS Vector,AWS Severity,Hosts,Host Type,Percentage Impacted
Cisco IOS IKEv1 Packet Handling Remote Information Disclosure (cisco-sa-20160916-ikev1) (BENIGNCERTAIN),NES-93736,4.6,CVSS2#AV:N/AC:L/Au:N/C:P/I:N/A:N,,,AV:N/AC:L/Au:N/C:P/I:N/A:N,,26,26,
Cisco IOS Software TCP Memory Leak DoS (cisco-sa-20150325-tcpleak),NES-82568,4.9,CVSS2#AV:N/AC:L/Au:N/C:N/I:N/A:C,,,AV:N/AC:L/Au:N/C:N/I:N/A:C,,30,26,
RHEL 5 / 6 / 7 : nss and nss-util (RHSA-2016:2779),NES-94912,4.2,CVSS2#AV:N/AC:M/Au:N/C:C/I:C/A:C/E:F/RL:OF/RC:ND,,,AV:N/AC:M/Au:N/C:C/I:C/A:C/E:F/RL:OF/RC:ND,,5112,23,

Here is the output after my script (which is attached bellow)

Vulnerability Title,Plugin ID,Original CVSS Score,Default Vector,Original Severity,AWS Score,AWS Vector,AWS Severity,Hosts,Host Type,Percentage Impacted
ium,4.6,AV:A/AC:H/Au:M/C:P/I:N/A:P/CDP:L/TD:H/CR:H/IR:H/AR:H,Medium,26,26,0.2524271844660194
Cisco IOS Software TCP Memory Leak DoS (cisco-sa-20150325-tcpleak),NES-82568,4.9,CVSS2#AV:N/AC:L/Au:N/C:N/I:N/A:C,Medium,4.9,AV:A/AC:H/Au:M/C:N/I:N/A:C/CDP:L/TD:M/CR:H/IR:H/AR:H,Medium,30,26,0.2912621359223301
RHEL 5 / 6 / 7 : nss and nss-util (RHSA-2016:2779),NES-94912,4.2,CVSS2#AV:N/AC:M/Au:N/C:C/I:C/A:C/E:F/RL:OF/RC:ND,Medium,4.2,AV:A/AC:H/Au:M/C:C/I:C/A:C/E:F/RL:OF/RC:ND/CDP:L/TD:M/CR:H/IR:H/AR:H,Medium,5112,23,0.615458704550927

To explain it a little further, Line 1 starts with 'ium' which is a cutoff of the word Medium which comes from the bottom part of my script at line 128 (the part that says #ORIGINAL SCORE). It should say Medium. So basically, if you look at like 2 of my input, and compare to output, its cutting out this entire line, and adding only half of the word that the script is trying to add. I thought maybe it was because of all the brackers or something, but I am not sure.

Cisco IOS IKEv1 Packet Handling Remote Information Disclosure (cisco-sa-20160916-ikev1) (BENIGNCERTAIN),NES-93736,4.6,CVSS2#AV:N/AC:L/Au:N/C:P/I:N/A:N,

Here is the script that is performing this funciton. Its a bit ugly I know, and improvement suggestions are welcomed, but finding out why its messing up my file is my priority right now. I have thought about switching to pandas but that will take a bit of time because I have never used it at all so have no idea how to do this yet.

def rescore_function():
#headers
    print 'Starting Rescore'
    csv_in = open('/tmp/rescore_test.csv', 'rb')
    csv_out = open('/tmp/rescored_vulnerabilities.csv', 'wb')
    writer = csv.writer(csv_out)
    reader = csv.reader(csv_in)
    headers = next(reader, None)
    if headers:
        writer.writerow(headers)

    print 'Creating Target Distrobution'
    for row in csv.reader(csv_in):
    #This is a terrible way of setting up the percentage of hosts impacted for target distrobution. Its ugly and horrible. Host count defines the host impacted, host_type identifies what kind of host it is. Such as Alinux, Rhel5, or Cisco IOS
        host_count = float(row[8])
        host_type = float(row[9])
        alinux_impact = host_count / ALINUX_HOST
        cisco_impact = host_count / CISCO_COUNT
        juniper_impact = host_count / JUNIPER_COUNT
        citrix_impact = host_count / CITRIX_COUNT        
        all_linux= host_count / LINUX_TOTAL
        print 'math set'

#The reason for vul_id is 3 lists combined is simple. alinux_impact NEEDS to be 24, cisco NEEDs to be 26, juniper NEEDS to match 27, because vul_id is the softwares 'vulnerability ID type
#range falls into all_linux. So fillvalue=vul_os[-1]  means if its not 24,26,27, it is "all_linux" which means it compares it to the All linux number.       
        vul_id = [24, 26, 27, 25] + range(24) + range(28,101)
        vul_os = [alinux_impact, cisco_impact, juniper_impact, all_linux]

        append_file = open('/tmp/rescored_vulnerabilities.csv', 'ab')
        append_write = csv.writer(append_file)

#Does the for loop with the fillvalue as mentioned above. Basically Y is the host type (linux, Cisco IOS, etc) and X is the vulnerability type. So it runs through and figures out the TD and rescore methods.
#X equals the percetange of impacted, so the Metric will be based on amount/percentage of X impacted and does a regex search and replace based on that using the CVSS calculations.
        print vul_id
        print vul_os
        for x,y in izip_longest(vul_os, vul_id, fillvalue=vul_os[-1]):
            print x,y
            print host_type
     #VECTOR REGEXP, host_type is which OS/Device type. 23 = RHEL5, 24 = Alinux, 26 = Cisco, 27 = Juniper   
            if host_type == y:
                row[10] = x
                if  x <= 0.25:
                    AC_Metric = 'A:C/CDP:L/TD:L/CR:H/IR:H/AR:H'
                    AP_Metric = 'A:P/CDP:L/TD:L/CR:H/IR:H/AR:H'
                    AN_Metric = 'A:N/CDP:L/TD:L/CR:H/IR:H/AR:H'
                    RCUC_Metric = 'RC:UC/CDP:L/TD:L/CR:H/IR:H/AR:H'
                    RCUR_Metric = 'RC:UR/CDP:L/TD:L/CR:H/IR:H/AR:H'
                    RCC_Metric = 'RC:C/CDP:L/TD:L/CR:H/IR:H/AR:H'
                    RCND_Metric = 'RC:ND/CDP:L/TD:L/CR:H/IR:H/AR:H'
                elif 0.26 <= x <= 0.75:
                    AC_Metric = 'A:C/CDP:L/TD:M/CR:H/IR:H/AR:H'
                    AP_Metric = 'A:P/CDP:L/TD:M/CR:H/IR:H/AR:H'
                    AN_Metric = 'A:N/CDP:L/TD:M/CR:H/IR:H/AR:H'
                    RCUC_Metric = 'RC:UC/CDP:L/TD:M/CR:H/IR:H/AR:H'
                    RCUR_Metric = 'RC:UR/CDP:L/TD:M/CR:H/IR:H/AR:H'
                    RCC_Metric = 'RC:C/CDP:L/TD:M/CR:H/IR:H/AR:H'
                    RCND_Metric = 'RC:ND/CDP:L/TD:M/CR:H/IR:H/AR:H'
                else:
                    AC_Metric = 'A:C/CDP:L/TD:H/CR:H/IR:H/AR:H'
                    AP_Metric = 'A:P/CDP:L/TD:H/CR:H/IR:H/AR:H'
                    AN_Metric = 'A:N/CDP:L/TD:H/CR:H/IR:H/AR:H'
                    RCUC_Metric = 'RC:UC/CDP:L/TD:H/CR:H/IR:H/AR:H'
                    RCUR_Metric = 'RC:UR/CDP:L/TD:H/CR:H/IR:H/AR:H'
                    RCC_Metric = 'RC:C/CDP:L/TD:H/CR:H/IR:H/AR:H'
                    RCND_Metric = 'RC:ND/CDP:L/TD:H/CR:H/IR:H/AR:H'


                text = row[6]
                text = re.sub(r'AV:N','AV:A',text)
                text = re.sub(r'AC:L','AC:H',text)
                text = re.sub(r'AC:M','AC:H',text)
                text = re.sub(r'Au:N','Au:M',text)
                text = re.sub(r'Au:S','Au:M',text)
                text = re.sub(r'A:C$',AC_Metric,text)
                text = re.sub(r'A:P$',AP_Metric,text)
                text = re.sub(r'A:N$',AP_Metric,text)
                text = re.sub(r'RC:UC',RCUC_Metric,text)
                text = re.sub(r'RC:UR',RCUR_Metric,text)
                text = re.sub(r'RC:C',RCC_Metric,text)
                text = re.sub(r'RC:ND',RCND_Metric,text)
                row[6] = text
    #NEW SCORE, uses CVSS module to take the previous vector and find out the the numbered score. It then uses that number to define the severity word.
                try:
                    vector = row[6]
                    c = CVSS2(vector)
                    row[5] = c.scores()[2]
                    vul_score = row[5]
                    if 0 <= vul_score <= 3.9:
                        vuln_word = 'Low'
                    elif 4.0 <= vul_score <=6.9:
                        vuln_word = 'Medium'
                    elif 7.0 <= vul_score <= 9.9:
                        vuln_word = 'High'
                    else:
                        vuln_word = 'Critical'
                    row[7] = vuln_word
                except CVSS2MalformedError:
                    rescored_success = False
                    pass
    #ORIGINAL SCORE, does the same as above for the original vector since NESSUS does not provide the Severity "word". This only finds the word, not the number value.
                default_score = float(row[2])
                if 0 <= default_score <= 3.9:
                    default_severity = 'Low'
                elif 4.0 <= default_score <=6.9:
                    default_severity = 'Medium'
                elif 7.0 <= default_score <= 9.9:
                    default_severity = 'High'
                else:
                    default_severity = 'Critical'
                row[4] = default_severity
                append_write.writerow(row)

actually r or rb is OK in python 2. It's when writing that most python 2 versions (but the lastest release) needs `"wb"` or it inserts blank lines (on windows). — Jean-François Fabre, Jan 03 '17 at 20:04
it's a bug, check here: http://stackoverflow.com/questions/38808284/portable-way-to-write-csv-file-in-python-2-or-python-3 — Jean-François Fabre, Jan 03 '17 at 20:14

Jean-François Fabre · Accepted Answer · 2017-01-03T20:25:39.943

Your code is quite big so hard to reproduce, but I suspect that something is fishy with the write file handles and all the buffering going on/concurrent buffered file access in write mode. Quite a mess

first you're opening / truncating with csv_out = open('/tmp/rescored_vulnerabilities.csv', 'wb')
you write the header
for each iteration, whereas the aforementioned handle isn't closed, you open the file in append mode: append_file = open('/tmp/rescored_vulnerabilities.csv', 'ab')
you don't close append_file either !

I'd advise this:

first truncated open is ok
remove append_file = open('/tmp/rescored_vulnerabilities.csv', 'ab')
replace append_write by write (it will work, write points on the same file and is still open)
don't forget to close csv_out in the end (or put all the code in a with open(...) as csv_out: block

Note that this issue is Un*x only. On Windows filesystems, it would throw an exception right away because file cannot be opened twice in write mode (and sometimes it's just as well).

Ah, yeah it was that append file. I removed that and switched it to just use the original writer and that fixed everything. Thank you so much for the help! And I do close the file write file a little later int he script when its done, but I will fix all that as well. — Mallachar, Jan 03 '17 at 20:21

Python csv cutting off parts of column

1 Answers1