0

I have used http://www.regexe.com/ to test a regex I've create in order to extract the date and time from syslogand it's showing me that the regex is in fact correct, highlighting the date and time. However when I try this in Perl I'm returned with just the time, not the date.

so for example from the string Dec 9 12:45:36 osboxes NetworkManager[739]: <info> address 192.168.10.129 I would be returned 12:45:36

Here's my script:

use strict;
use warnings;

my $keywords = 'keywords.txt';
open(my $kw, '<:encoding(UTF-8)', $keywords)
or die "Could not open file '$keywords' $!";    # Open the file, throw an exception if the file cannot be opened. 
chomp (my @keywordsarray = <$kw>); # Remove whitespace, and read it into an array 
close($kw);# Close the file 

my $syslog = 'syslog';
open(my $sl, '<:encoding(UTF-8)', $syslog)
or die "Could not open file '$keywords' $!";    # Open the file, throw an exception if the file cannot be opened. 
chomp (my @syslogarray = <$sl>); # Remove whitespace, and read it into an array 
close($sl);# Close the file  

foreach my $line (@syslogarray) 
{
foreach my $keyword (@keywordsarray)
{
    if ($line =~ m/\Q$keyword\E/)
    {
        if ((my $date) = $line =~  m/[A-z]+\s{2}\d{1,}\s((\d{2}[:]){2}\d{2})/)
        {   
            print "**". $keyword. "**". $date. "\n";
        }
    }
}
}
Simon
  • 31
  • 4

2 Answers2

1

You may just use the capturing group around the whole pattern.

if ((my $date) = $line =~  m/([A-Z]+\s{2}\d+\s(?:\d{2}:){2}\d{2})/i)
                             ^                                  ^

See IDEONE demo

When you use (my $date) you tell the engine to put the contents matched by the first capturing group to the $date variable. So, all you need is to use a pair of unescaped parentheses around that part of pattern that will match the necessary string of text in the input string.

Note that [A-z] is ambiguous (see [A-z] and [a-zA-Z] difference) and is better re-written as [A-Za-z] or [A-Z] with an /i modifier (as I suggested above).

Also, \d{1,} is equal to \d+ (+ quantifier means 1 or more occurrences, same as {1,0}). You can use this latter variant since it is concise and more readable.

There is no point in placing : into a character class [:], a colon does not have to be escaped in a regex pattern (unless it is a regex delimiter, and here it is not).

Community
  • 1
  • 1
Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
  • Glad it worked for you. Please also consider upvoting if my answer proved helpful to you (see [How to upvote on Stack Overflow?](http://meta.stackexchange.com/questions/173399/how-to-upvote-on-stack-overflow)) since now you have the privilege. – Wiktor Stribiżew Dec 09 '15 at 23:11
0

You have to put a group around the date -

/(?i)([a-z]+\s{2}\d{1,})\s((?:\d{2}:){2}\d{2})/

Formatted:

 (?i)
 ( [a-z]+ \s{2} \d{1,} )       # (1), Date
 \s 
 (                             # (2 start), Time
      (?: \d{2} : ){2}
      \d{2} 
 )                             # (2 end)

And, add another variable to the list.

if (($date, $time) = $line =~ /([A-z]+\s{2}\d{1,})\s((?:\d{2}:){2}\d{2})/)