I have a log file which I have to include it into QlikSense. QlikSense reads line to line of the log file so I need an expresion for splitting this line into the desired columns.
Log file looks like (its size is about 2.5 millions of entries):
202.32.92.47 - - [01/Jun/1995:00:00:59 -0600] "GET /~scottp/publish.html" 200 271 - -
ix-or7-27.ix.netcom.com RFC-1413 - [01/Jun/1995:00:02:51 -0600] "GET /~ladd/ostriches.html" 200 205908 - "Mozilla/5.0 (X11; U; Linux i686; es-ES;rv:1.7.5)"
ppp-4.pbmo.net - John Thomas [07/Dec/1995:13:20:28 -0600] "GET /dcs/courses/cai/html/introduction_lesson/index.html HTTP/1.0" 500 - "http://www.wikipedia.org/" "Mozilla/5.0 (X11; U; Linux i686; es-ES;rv:1.7.5)"
ppp-4.pbmo.net - John Thomas [07/Dec/1995:13:20:37 -0600] "GET /dcs/courses/cai/html/index.html HTTP/1.0" 500 4528 - -
lbm2.niddk.nih.gov RFC-1413 John Thomas [07/Dec/1995:13:21:03 -0600] "GET /~ladd/vet_libraries.html" 200 11337 "http://www.wikipedia.org/" -
The struct of each line of this log file is: IP ID NAME DATETIME TIMEZONE METHOD DIR STATUS MB WEB FROM
. So, I will split previous log example using ||
for a better visualization:
|| ix-or7-27.ix.netcom.com || RFC-1413 || - || [01/Jun/1995:00:02:51 || -0600] "GET || /~ladd/ostriches.html" || 200 || 205908 || - || "Mozilla/5.0 (X11; U; Linux i686; es-ES;rv:1.7.5)" ||
|| ppp-4.pbmo.net || - || John Thomas || [07/Dec/1995:13:20:28 || -0600] || "GET || /dcs/courses/cai/html/introduction_lesson/index.html HTTP/1.0" || 500 || - || "http://www.wikipedia.org/" || "Mozilla/5.0 (X11; U; Linux i686; es-ES;rv:1.7.5)" ||
|| ppp-4.pbmo.net || - || John Thomas || [07/Dec/1995:13:20:37 || -0600] || "GET || /dcs/courses/cai/html/index.html HTTP/1.0" || 500 || 4528 || - || - ||
|| lbm2.niddk.nih.gov || RFC-1413 || John Thomas || [07/Dec/1995:13:21:03 || -0600] || "GET || /~ladd/vet_libraries.html" || 200 || 11337 || "http://www.wikipedia.org/" || - ||
So, for example, for the first line:
IP = ix-or7-27.ix.netcom.com
ID = RFC-1413
NAME = -
DATETIME = 01/Jun/1995 00:02:51
TIMEZONE = -0600
METHOD = GET
DIR: /~ladd/ostriches.html
STATUS = 200
MB = 205908
WEB = -
FROM = Mozilla/5.0 (X11; U; Linux i686; es-ES;rv:1.7.5)
So, each field's value can be text
or -
. I have trying many ways of including it but I do not achieve that.
I have tryed splitting each line using space separator, but this not work due to each line can have different number of spaces. Also using -
,... But I did not get it work due to data lenght is variable.
I have tought that maybe doing a RegEx (a pattern) maybe could solve my problem, but I have not get experience in patterns and I do not know how could I do that.
EDIT 1:
If the solution to my problem is a regex pattern should do next:
- First parameter: catch all up to space
- Second parameter: catch all up to space
- Third parameter: catch all up to [
- Fourth parameter: catch all up to space
- Fifth parameter: catch all up to ]
- Sixth parameter: catch all up to space
- Seventh parameter: catch all up to space
- Eigth parameter: catch all up to space
- Nineth parameter: catch all up to space
- Tenth parameter: catch all inside "" or -
- Eleventh parameter: catch all inside "" or -
Any idea how could I got it?
Thank you.