You can re
to split each line to a list of tuples or a dict. You can use this to populate a DataFrame
def parse_logfile(log_file_handle):
p = re.compile(r'\s*(.*?)="(.*?)"', )
for line in log_file_handle:
yield p.findall(line)
For the line you posted, this yields
[('TIMESTAMP', 'Jun 7 2010 15:03:49 NZST'),
('ACCESS-TYPE', 'ABC'),
('TYPE', 'XYZ'),
('PACKET-TYPE', 'St'),
('REASON', 'bkz'),
('CIRCUIT-ID', 'UIX eth 1/1/11/20'),
('REMOTE-ID', 'NBC'),
('CALLING-STATION-ID', 'LKP'),
('SUB-ID', 'JIK')]
So in another part of the code you can do something like.
with open(log_filename, 'r') as log_file_handle:
log_lines = parse_logfile(log_file_handle)
df = pd.DataFrame()
for line in log_lines:
df = df.append(dict(line), ignore_index=True, )
test_data
TIMESTAMP="Jun 7 2010 15:03:49 NZST" ACCESS-TYPE="ABC" TYPE="XYZ" PACKET-TYPE="St" REASON="bkz" CIRCUIT-ID="UIX eth 1/1/11/20" REMOTE-ID="NBC" CALLING-STATION-ID="LKP" SUB-ID="JIK"
TIMESTAMP="Jun 7 2010 15:03:50 NZST" ACCESS-TYPE1="ABC1" TYPE="XYZ" PACKET-TYPE="St" REASON="bkz" CIRCUIT-ID="UIX eth 1/1/11/20" REMOTE-ID="NBC" CALLING-STATION-ID="LKP" SUB-ID="JIK"
TIMESTAMP="Jun 7 2010 15:03:51 NZST" ACCESS-TYPE="ABC2" TYPE="XYZ" PACKET-TYPE="St" REASON="bkz" CIRCUIT-ID="UIX eth 1/1/11/20" REMOTE-ID="NBC" CALLING-STATION-ID="LKP" SUB-ID="JIK"
So I changed the timestamps and access-types and the second entry has ACCESS-TYPE1
instead of ACCESS-TYPE
result
ACCESS-TYPE CALLING-STATION-ID CIRCUIT-ID PACKET-TYPE REASON REMOTE-ID SUB-ID TIMESTAMP TYPE ACCESS-TYPE1
0 ABC LKP UIX eth 1/1/11/20 St bkz NBC JIK Jun 7 2010 15:03:49 NZST XYZ NaN
1 NaN LKP UIX eth 1/1/11/20 St bkz NBC JIK Jun 7 2010 15:03:50 NZST XYZ ABC1
2 ABC2 LKP UIX eth 1/1/11/20 St bkz NBC JIK Jun 7 2010 15:03:51 NZST XYZ NaN
If all the lines have the same keys in the same order, the appending should be easy. If this changes throughout the file, this might become more difficult. Can you post more lines?