I have a log file with 10000 and more lines, some of them have this structure:
"2023-07-19 13:38:45,220 INFO Type:type: ('rate',), {'value': 123, 'unit': 'Count/Second', 'id': 'ABC123', 'name': 'London'}\n"
I would like to extract these info and put in a pandas.DataFrame
.
This is my initial code:
import ast
import pandas as pd
import re
infile = "./log_file.log"
with open(infile) as f:
lines = f.readlines()
df = pd.DataFrame(columns=["time", "type", "value", "unit", "id", "name"])
for line in lines:
if "type" in line:
value = ast.literal_eval(re.search('{(.*)}', line).group(0))
value["time"] = line.split("INFO")[0][:-1]
value["type"] = re.search(r"\((.*)\)", line).group(1)[:-1]
df = df.append(value, ignore_index=True)
so to have a dataframe like this:
time type value unit id name
0 2023-07-19 13:38:45,220 rate 123 Count/Second ABC123 London
but the for
loop takes ages to go through the whole file.
Any suggestion how to optimise it?