Edit: Changed so that in output have column name access
(Note the addition of as_index=False
and .reset_index
in [1] See sources 5 and 6
[1] First groupby
on CVE_ID
column and use size
:
counts = myframe.groupby(['CVE_ID','VulnName','ServerOwner'], as_index=False).size().unstack(fill_value=0).reset_index()
ServerOwner CVE_ID VulnName Alice Bob Carol
0 CVE-2017-1111 Java Update 1 1 1 0
1 CVE-2017-1112 Java Update 2 1 1 0
2 CVE-2017-1113 Java Update 3 1 1 1
3 CVE-2017-1114 Adobe 1 1 1 1
4 CVE-2017-1115 Chrome 1 1 0 1
5 CVE-2017-1116 Chrome 2 0 0 1
6 CVE-2017-1117 Chrome 3 0 0 1
[2] Then sum over the Alice, Bob and Carol column to get:
counts['Count'] = counts[['Alice','Bob','Carol']].sum(axis=1)
ServerOwner CVE_ID VulnName Alice Bob Carol Count
0 CVE-2017-1111 Java Update 1 1 1 0 2
1 CVE-2017-1112 Java Update 2 1 1 0 2
2 CVE-2017-1113 Java Update 3 1 1 1 3
3 CVE-2017-1114 Adobe 1 1 1 1 3
4 CVE-2017-1115 Chrome 1 1 0 1 2
5 CVE-2017-1116 Chrome 2 0 0 1 1
6 CVE-2017-1117 Chrome 3 0 0 1 1
[3] Then delete name columns using df.drop
on names
:
counts.drop(['Carol','Bob','Alice'],inplace=True,axis=1)
ServerOwner CVE_ID VulnName Count
0 CVE-2017-1111 Java Update 1 2
1 CVE-2017-1112 Java Update 2 2
2 CVE-2017-1113 Java Update 3 3
3 CVE-2017-1114 Adobe 1 3
4 CVE-2017-1115 Chrome 1 2
5 CVE-2017-1116 Chrome 2 1
6 CVE-2017-1117 Chrome 3 1
[4] Then you use sort_values
on the sum
column:
counts.sort_values(by='Count', ascending=False, inplace=True)
ServerOwner CVE_ID VulnName Count
2 CVE-2017-1113 Java Update 3 3
3 CVE-2017-1114 Adobe 1 3
0 CVE-2017-1111 Java Update 1 2
1 CVE-2017-1112 Java Update 2 2
4 CVE-2017-1115 Chrome 1 2
5 CVE-2017-1116 Chrome 2 1
6 CVE-2017-1117 Chrome 3 1
Combined:
counts = myframe.groupby(['CVE_ID','VulnName','ServerOwner'], as_index=False).size().unstack(fill_value=0).reset_index()
counts['Count'] = counts[['Alice','Bob','Carol']].sum(axis=1)
counts.drop(['Carol','Bob','Alice'],inplace=True,axis=1)
counts.sort_values(by='Count', ascending=False, inplace=True)
print "The dataframe: \n", myframe
print "Top 10 offending CVEs, Vulnerability and Count: \n"
print counts
Top 10 offending CVEs, Vulnerability and Count:
ServerOwner CVE_ID VulnName Count
2 CVE-2017-1113 Java Update 3 3
3 CVE-2017-1114 Adobe 1 3
0 CVE-2017-1111 Java Update 1 2
1 CVE-2017-1112 Java Update 2 2
4 CVE-2017-1115 Chrome 1 2
5 CVE-2017-1116 Chrome 2 1
6 CVE-2017-1117 Chrome 3 1
If needed, Can use reset_index()
to reset index at this point.
Edit: In response to comment about serverOwner
index, you can reset index, drop old index, and rename new index:
counts.reset_index(drop=True, inplace = True)
counts.index.names = ['index']
Gives:
ServerOwner CVE_ID VulnName Count
index
0 CVE-2017-1113 Java Update 3 3
1 CVE-2017-1114 Adobe 1 3
2 CVE-2017-1111 Java Update 1 2
3 CVE-2017-1112 Java Update 2 2
4 CVE-2017-1115 Chrome 1 2
5 CVE-2017-1116 Chrome 2 1
6 CVE-2017-1117 Chrome 3 1
(The ServerOwner
name remains as a remnant of the original groupby
command to detail which column was used.)
Sources for this answer:
[1]
Groupby value counts on the dataframe pandas
dataframe-pandas
[2]
Pandas: sum DataFrame rows for given columns
[3]
Delete column from pandas DataFrame
[4]
python, sort descending dataframe with pandas
[5]
Converting a Pandas GroupBy object to DataFrame
[6]
How to GroupBy a Dataframe in Pandas and keep Columns