I am writing a script which parses an apache log file into a pandas table. Now I recognized that in the log file the text is sometimes like this / with utf8 values in it:
Bern,%20Bahnhof
For example one log file text line:
IP - - timestamp "GET /v1/connections?from=Bern,%20Bahnhof&to=Luzern HTTP/1.1" httpstatus bytes -"
My current code to open the log files:
cols = ['ip','l','userid','timestamp','tz','fullrequest','status','bytes','referer','useragent']
df = pd.read_csv(path + file, delim_whitespace=True, names=cols, error_bad_lines=False, encoding='utf8').drop('l', 1)
df = df.drop('userid', 1)
Is there a way to parse the log files into pandas, so that these strange chars are converted into latin chars?
So that in the end we have somthing like this:
Bern, Bahnhof