Task:
I have a task to match the strings in my first column of csv file to log files, if it exist then put the matched string in the third column otherwise put "undetected"
Contents of my log file -trendx.log Contents of my csv file - sha1_vsdt.csv
Expected Output:
Code:
So far I have used this concept using pandaframe and numpy, just followed somebody's advice
import numpy as np
import pandas as pd
import csv
#Log data into dataframe using genfromtxt
logdata = np.genfromtxt("trendx.log", delimiter=" ",invalid_raise = False,dtype=str, comments=None,usecols=np.arange(0,24))
logframe = pd.DataFrame(logdata)
#Dataframe trimmed to use only SHA1, PRG and IP
df2=(logframe[[10,14,15]]).rename(columns={10:'SHA1', 14: 'PRG',15:'IP'})
#sha1_vsdt data into dataframe using read_csv
df1=pd.read_csv("sha1_vsdt.csv",delimiter=r"|",error_bad_lines=False,engine = 'python',quoting=3)
#Using merge to compare the two CSV
df = pd.merge(df1, df2, left_on='SHA-1', right_on='SHA1', how='left').replace(np.nan, 'undetected', regex=True)
print df[['SHA-1','VSDT','PRG','IP']]
Then I'm having this error:
Warning (from warnings module):
File "C:\Users\Administrator\Desktop\OJT\match.py", line 6
logdata = np.genfromtxt("trendx.log", delimiter=" ",invalid_raise = False,dtype=str, comments=None,usecols=np.arange(0,24))
ConversionWarning: Some errors were detected !
Line #1 - #113 (got 1 columns instead of 24)
Traceback (most recent call last):
File "C:\Users\Administrator\Desktop\OJT\match.py", line 9, in <module>
df2=(logframe[[10,14,15]]).rename(columns={10:'SHA1', 14: 'PRG',15:'IP'})
File "C:\Python27\lib\site-packages\pandas\core\frame.py", line 2682, in __getitem__
return self._getitem_array(key)
File "C:\Python27\lib\site-packages\pandas\core\frame.py", line 2726, in _getitem_array
indexer = self.loc._convert_to_indexer(key, axis=1)
File "C:\Python27\lib\site-packages\pandas\core\indexing.py", line 1327, in _convert_to_indexer
.format(mask=objarr[mask]))
KeyError: '[10 14 15] not in index'