0
processLine('23/05/2017 07:10:58 [6] 00-Always: ACTION=QUERY&Text=iphone%205%20has%20no%20network%2A&Summary=Context&SpellCheck=true&QuerySummary=false&Sort=AutnRank%2BRelevance&Synonym=true&TotalResults=true&MaxResults=10&PrintFields=drereference%2Cdretitle%2Ccontenttype%2Cautnrank%2COPTUS%5FFILTER1%2COPTUS%5FFILTER2%2COPTUS%5FFILTER3%2CCANONICAL%5FURL&Start=1&Predict=false&FieldText=%28MATCH%7BMy%20Optus%20Community%7D%3AOPTUS%5FFILTER1%3AOPTUS%5FFILTER2%3AOPTUS%5FFILTER3%20NOT%20MATCH%7Bsmb%7D%3ACONTEXT%20NOT%20MATCH%7BCustom%5FPromotions%7D%3ADREDBNAME%29%2BOR%2B%28%28MATCH%7BCustom%5FPromotions%7D%3ADREDBNAME%29%2BAND%2B%28BIASVAL%7Biphone%205%20has%20no%20network%2A%2C100%7D%3APromotion%5FKeywords%29%2BAND%2B%28MATCH%7Biphone%205%20has%20no%20network%2A%7D%3APromotion%5FKeywords%29%29&Combine=Simple&Characters=250 (127.0.0.1)');
if (defined $query && defined $ip && $query =~ m!/?a.*?=(\w+)([?&].*(?<=[?&])Text=([^?&]*))?!) 
{
        $events++;
        my $action = $1;
        my $terms = uri_unescape($3) || "";
}

I am looking to strip iphone%205%20has%20no%20network%2A from Text=iphone%205%20has%20no%20network%2A and store in $3. I tested regex and do not seem find an issue. It prints $3 as Context.

Expectation is $3 outputs value as iphonehasnonetwork

When I pass,

processLine('25/05/2017 14:48:10 [9] 00-Always: action=Query&text=samsung&databasematch=Help_Support&ResponseFormat=json&_=1495687690880 (127.0.0.1)');

It prints $3 as QuerySamsung. This is the expected result.

I am new to Perl, and I am looking to modify this regex and sort this issue out. I have already stripped down my perl script and diagnosed the root problem. This regex looks fine to me after testing individual components of regex in regex101.com

Himan
  • 79
  • 9
  • 1
    Why not use [CGI](http://perldoc.perl.org/CGI.html) to parse the query parameters? – tadman Jun 08 '17 at 06:07
  • The perl is written to generate stats from StatsServer of HPE IDOL. No option but have to do this way. – Himan Jun 08 '17 at 06:09
  • 2
    We're talking about Perl here. Of course there's options. – tadman Jun 08 '17 at 06:33
  • `if ($text =~ /\bText=\b(.*?)\bSummary\b/) { $result = $1; $result =~ tr/%20A&/ /; $result =~ s/ //g; print $result; }` that is not very complex and it strips exactly `iphone5hasnonetwork` – Gerhard Jun 08 '17 at 07:00
  • When I test the regex the output of $3 is 'iphone%205%20has%20no%20network%2A', do you just wan't to replace the uri characters? – Corex Cain Jun 08 '17 at 07:06

2 Answers2

1

You forgot to add ignore case modifier to your regexp:

$query =~ m!/?a.*?=(\w+)([?&].*(?<=[?&])Text=([^?&]*))?!i

see i in the end?

read more here: https://perldoc.perl.org/perlre.html#Modifiers

Kosh
  • 16,966
  • 2
  • 19
  • 34
0

I'm not sure what other problems your regex has, but off the bat I see: 1) capture groups are numbered by the opening parenthesis, so I think you want $4, not $3 2) 'Text' might match 'FieldText' later in the string

You really ought to just parse the URI correctly by splitting all the arguments (&) and then splitting the key-value pairs (=)

GWP
  • 1