I'm using regular expressions to parse logs. I was previously reading the File into a string array, and then iterating through the string array appending if I don't match the timestamp, otherwise I add the line I'm iterating on to a variable and continue the search. Once I get a complete log entry, I use another regular expression to parse it.
Scanning file
try {
List<String> lines = Files.readAllLines(filepath);
Pattern pattern = Pattern.compile("\\d{4}-\\d{2}-\\d{2} \\d{2}:\\d{2}:\\d{2},\\d{3}");
Matcher matcher;
String currentEntry = "";
for(String line : lines) {
matcher = pattern.matcher(line);
// If this is a new entry, then wrap up the previous one and start again
if ( matcher.lookingAt() ) {
// If the previous entry was not empty
if(!StringUtils.trimWhitespace(currentEntry).isEmpty()) {
entries.add(new LogEntry(currentEntry));
}
// Clear the current entry
currentEntry = "";
}
if (!currentEntry.trim().isEmpty())
currentEntry += "\n";
currentEntry += line;
}
// At the end, if we have one leftover entry, add it
if (!currentEntry.isEmpty()) {
entries.add(new LogEntry(currentEntry));
}
}catch (Exception ex){
return null;
}
Parsing entry
final private static String timestampRgx = "(?<timestamp>\\d{4}-\\d{2}-\\d{2} \\d{2}:\\d{2}:\\d{2},\\d{3})";
final private static String levelRgx = "(?<level>(?>INFO|ERROR|WARN|TRACE|DEBUG|FATAL))";
final private static String classRgx = "\\[(?<class>[^]]+)\\]";
final private static String threadRgx = "\\[(?<thread>[^]]+)\\]";
final private static String textRgx = "(?<text>.*)";
private static Pattern PatternFullLog = Pattern.compile(timestampRgx + " " + levelRgx + "\\s+" + classRgx + "-" + threadRgx + "\\s+" + textRgx + "$", Pattern.DOTALL);
public LogEntry(String logText) {
try {
Matcher matcher = PatternFullLog.matcher(logText);
matcher.find();
String dateStr = matcher.group("timestamp");
timestamp = new DateLogLevel();
timestamp.parseLogDate(dateStr);
String levelStr = matcher.group("level");
loglevel = LOG_LEVEL.valueOf(levelStr);
String fullClassStr = matcher.group("class");
String[] classNameArray = fullClassStr.split("\\.");
framework = classNameArray[2];
classname = classNameArray[classNameArray.length - 1];
threadname = matcher.group("thread");
logtext = matcher.group("text");
notes = "";
} catch (Exception ex) {
throw ex;
}
}
What I want to figure out
What I really want to do is read the whole file as a single string, then use a single regex to parse this out line by line, using a single regular expression once. My plan was to use the same expression I use in the constructor, but when looking for the log text make it end at either EOF or the next log line, as such
final String timestampRgx = "(?<timestamp>\\d{4}-\\d{2}-\\d{2} \\d{2}:\\d{2}:\\d{2},\\d{3})";
final String levelRgx = "(?<level>(?>INFO|ERROR|WARN|TRACE|DEBUG|FATAL))";
final String classRgx = "\\[(?<class>[^]]+)\\]";
final String threadRgx = "\\[(?<thread>[^]]+)\\]";
final String textRgx = "(?<text>.*[^(\Z|\\d{4}\-\\d{2}\-\\d{2} \\d{2}:\\d{2}:\\d{2},\\d{3})"; // change to handle multiple lines
private static Pattern PatternFullLog = Pattern.compile(timestampRgx + " " + levelRgx + "\\s+" + classRgx + "-" + threadRgx + "\\s+" + textRgx + "$", Pattern.DOTALL);
try {
// Read file into string
String lines = readFile(filepath);
Pattern pattern = Pattern.compile("\\d{4}-\\d{2}-\\d{2} \\d{2}:\\d{2}:\\d{2},\\d{3}");
Matcher matcher;
matcher = pattern.matcher(line);
while(matcher.find())
String dateStr = matcher.group("timestamp");
timestamp = new DateLogLevel();
timestamp.parseLogDate(dateStr);
String levelStr = matcher.group("level");
loglevel = LOG_LEVEL.valueOf(levelStr);
String fullClassStr = matcher.group("class");
String[] classNameArray = fullClassStr.split("\\.");
framework = classNameArray[2];
classname = classNameArray[classNameArray.length - 1];
threadname = matcher.group("thread");
logtext = matcher.group("text");
entries.add(
new LogEntry(
timestamp,
loglevel,
framework,
threadname,
logtext,
""/* Notes are empty when importing new file */));
}
}
}catch (Exception ex){
return null;
}
The problem is that I can't seem to get the last group (textRgx) to multiline match until either a timestamp or end of file. Does anyone have any thoughts?
Sample Log Entries
2017-03-14 22:43:14,405 FATAL [org.springframework.web.context.support.XmlWebApplicationContext]-[localhost-startStop-1] Refreshing Root WebApplicationContext: startup date [Tue Mar 14 22:43:14 UTC 2017]; root of context hierarchy
2017-03-14 22:43:14,476 INFO [org.springframework.beans.factory.xml.XmlBeanDefinitionReader]-[localhost-startStop-1] Loading XML bean definitions from Serv
2017-03-14 22:43:14,476 INFO [org.springframework.beans.factory.xml.XmlBeanDefinitionReader]-[localhost-startStop-1] Here is a multiline
log entry with another entry after
2017-03-14 22:43:14,476 INFO [org.springframework.beans.factory.xml.XmlBeanDefinitionReader]-[localhost-startStop-1] Here is a multiline
log entry with no entries after