1

I have a document which looks something like:

sort=SIZE:NumberDecreasing
FieldText=(((EQUAL{226742}:LocationId)) AND ()) 

FieldText=(((EQUAL{226742}:LocationId)) AND ((EQUAL{1}:LOD AND NOTEQUAL{1}:SCR AND EMPTY{}:RPDCITYID AND NOTEQUAL{1}:Industrial))) 

FieldText=( NOT EQUAL{1}:ISSCHEME AND EQUAL{215629}:LocationId) 

sort=DEALDATE:decreasing

From this I would like the word before a colon, and if there are {} brackets, before those too, a colon, and then the word after the colon. These should ideally be the only things left in the file, each on their own new line.

Output would then look like:

SIZE:NumberDecreasing
EQUAL:LocationId 
EQUAL:LocationId
EQUAL:LOD
NOTEQUAL:SCR
EMPTY:RPDCITYID
NOTEQUAL:Industrial
EQUAL:ISSCHEME
EQUAL:LocationId    
DEALDATE:decreasing

The closest I have come so far is: Find: ^.?+ {[0-9]}:([a-zA-Z]+) Replace with: ...\1:\2...

with the intent to run it several times, and later replace ... with \n I can then remove multiple newlines.

Context: this is for a log analysis I am performing, I have already removed datestamps, and reduced elements of the query down to the sort and FieldText parameters

I do not have regular UNIX tools - I am working in a windows environment

The original log looks like:

03/11/2011 16:25:44 [9] ACTION=Query&summary=Context&print=none&printFields=DISPLAYNAME%2CRECORDTYPE%2CSTREET%2CTOWN%2CCOUNTY%2CPOSTCODE%2CLATITUDE%2CLONGITUDE&DatabaseMatch=Autocomplete&sort=RECORDTYPE%3Areversealphabetical%2BDRETITLE%3Aincreasing&maxresults=200&FieldText=%28WILD%7Bbournemou%2A%7D%3ADisplayName%20NOT%20MATCH%7BScheme%7D%3ARecordType%29 (10.55.81.151)
03/11/2011 16:25:45 [9] Returning 23 matches
03/11/2011 16:25:45 [9] Query complete
03/11/2011 16:25:46 [8] ACTION=GetQueryTagValues&documentCount=True&databaseMatch=Deal&minScore=70&weighfieldtext=false&FieldName=TotalSizeSizeInSquareMetres%2CAnnualRental%2CDealType%2CYield&start=1&FieldText=%28MATCH%7BBournemouth%7D%3ATown%29 (10.55.81.151)
03/11/2011 16:25:46 [12] ACTION=Query&databaseMatch=Deal&maxResults=50&minScore=70&sort=DEALDATE%3Adecreasing&weighfieldtext=false&totalResults=true&PrintFields=LocationId%2CLatitude%2CLongitude%2CDealId%2CFloorOrUnitNumber%2CAddressAlias%2A%2CEGAddressAliasID%2COriginalBuildingName%2CSubBuilding%2CBuildingName%2CBuildingNumber%2CDependentStreet%2CStreet%2CDependentLocality%2CLocality%2CTown%2CCounty%2CPostcode%2CSchemeName%2CBuildingId%2CFullAddress%2CDealType%2CDealDate%2CSalesPrice%2CYield%2CRent%2CTotalSizeSizeInSquareMetres%2CMappingPropertyUsetype&start=1&FieldText=%28MATCH%7BBournemouth%7D%3ATown%29 (10.55.81.151)
03/11/2011 16:25:46 [8] GetQueryTagValues complete
03/11/2011 16:25:47 [12] Returning 50 matches
03/11/2011 16:25:47 [12] Query complete
03/11/2011 16:25:51 [13] ACTION=Query&print=all&databaseMatch=locationidsearch&sort=RELEVANCE%2BPOSTCODE%3Aincreasing&maxResults=10&start=1&totalResults=true&minscore=70&weighfieldtext=false&FieldText=%28%20NOT%20LESS%7B50%7D%3AOFFICE%5FPERCENT%20AND%20EXISTS%7B%7D%3AOFFICE%5FPERCENT%20NOT%20EQUAL%7B1%7D%3AISSCHEME%29&Text=%28Brazennose%3AFullAddress%2BAND%2BHouse%3AFullAddress%29&synonym=True (10.55.81.151)
03/11/2011 16:25:51 [13] Returning 3 matches
03/11/2011 16:25:51 [13] Query complete

The purpose of the whole exercise is to find out which fields are being queried and sorted upon (and how we are querying/sorting upon them) - to this end, the output could also usefully be distinct - although that is not essential.

penguat
  • 1,337
  • 1
  • 13
  • 25
  • If I can't do this easily, I may end up writing a program to do it... What do you think? – penguat Nov 09 '11 at 16:00
  • I think a program is the way to go. This would be trivial in Perl. – Borodin Nov 11 '11 at 12:46
  • Would taking in the initial log format and reducing that to the end output be equally trivial in perl? I might need to see about getting perl for windows. – penguat Nov 15 '11 at 12:07
  • Yes it would - it is exactly the sort of thing that Perl was originally designed to do. – Borodin Nov 17 '11 at 05:41

1 Answers1

1

The Perl program below is complete, and includes your sample data in the source. It produces exactly the output you describe, including reporting NOT EQUAL{1}:ISSCHEME as EQUAL:ISSCHEME because of the intermediate space.

use strict;
use warnings;

while (<DATA>) {
  print "$1:$2\n" while /(\w+)  (?: \{\d*\} )? : (\w+)/xg;
}

__DATA__
sort=SIZE:NumberDecreasing
FieldText=(((EQUAL{226742}:LocationId)) AND ()) 

FieldText=(((EQUAL{226742}:LocationId)) AND ((EQUAL{1}:LOD AND NOTEQUAL{1}:SCR AND EMPTY{}:RPDCITYID AND NOTEQUAL{1}:Industrial))) 

FieldText=( NOT EQUAL{1}:ISSCHEME AND EQUAL{215629}:LocationId) 

sort=DEALDATE:decreasing

OUTPUT

  SIZE:NumberDecreasing
  EQUAL:LocationId
  EQUAL:LocationId
  EQUAL:LOD
  NOTEQUAL:SCR
  EMPTY:RPDCITYID
  NOTEQUAL:Industrial
  EQUAL:ISSCHEME
  EQUAL:LocationId
  DEALDATE:decreasing
Borodin
  • 126,100
  • 9
  • 70
  • 144