Notepad ++ reg expression to extract xml messages from log file

Question

I've a log file which has content like below. I'm trying to extract xml segmentation which matches with few itemnumber let's say 6654721, 6654722 and 6654725. the expected output is the complete xml segmentation with matches with those three itemnumber. I tried with regular expression (<Record>.*? </Record>) which exactly find the each xml segmentation then I tried to apply filter like (<Record>.*?(6654721|6654722|6654725).*?</Record>) but this is not working as expected. can you someone help me to address this? thanks for your response in advanace.

 2017-04-20 some log file
 2017-04-20 some log file
 2017-04-20 some log file
 <Record>
     <itemname>Lego Fire Rescue</itemname>
     <itemnumber>6654721</itemnumber>
     <availableinv>19</availableinv>
     <ageplus>3</ageplus>
     <storeId>19</storeId> 
 </Record>
 2017-04-20 some log file
 2017-04-20 some log file
 2017-04-20 some log file
 <Record>
     <itemname>Lego Fire Rescue</itemname>
     <itemnumber>6654722</itemnumber>
     <availableinv>19</availableinv>
     <ageplus>3</ageplus>
     <storeId>19</storeId> 
 </Record>
 2017-04-20 some log file
 2017-04-20 some log file
 2017-04-20 some log file
 <Record>
     <itemname>Lego Fire Rescue</itemname>
     <itemnumber>6654723</itemnumber>
     <availableinv>19</availableinv>
     <ageplus>3</ageplus>
     <storeId>19</storeId> 
 </Record>
 2017-04-20 some log file
 2017-04-20 some log file
 2017-04-20 some log file
 <Record>
     <itemname>Lego Fire Rescue</itemname>
     <itemnumber>6654725</itemnumber>
     <availableinv>19</availableinv>
     <ageplus>3</ageplus>
     <storeId>19</storeId> 
 </Record>

Toto · Answer 1 · 2017-04-21T09:23:40.073

1

This regex does the job:

<Record[^>]*>(?:(?!</Record>).)*\b(?:6654721|6654722|6654725)\b.*?</Record>

Explanation:

<Record[^>]>        : '<Record>' with optional attributes
(?:                 : start non capture group
    (?!             : start negative lookahead, make sure we have not the following
        </Record>   : literally '</Record>'
    )               : end lookahead
    .               : any character
)*                  : repeat the non capture group, at this place we are sure we have not </Record>
\b                  : word boundary
(?:                 : non capture group
    6654721         : 6654721
    |               : OR
    6654722         : 6654722
    |               : OR
    6654725         : 6654725
)                   : end group
\b                  : word boundary
.*?                 : 0 or more any character, non greedy
</Record>           : literally '</Record>'

edited Apr 21 '17 at 09:23

answered Apr 20 '17 at 10:09

Toto

89,455
62
89
125

Great, this works perfectly. exactly what I was looking for. Appreciated. can you help me to make it work please if some of the tag has an attribute like . I still want to see the same response. – Ponns Apr 21 '17 at 02:51
2

If this answer has solved your problem, you should accept it. – Imanuel Apr 21 '17 at 06:44
sorry for the delayed response. your edit works perfect. Thanks much for the solution. – Ponns Apr 24 '17 at 15:50
another help, I tried to copy all matching lines following the option Search-->Bookmark-->Copy Bookmarked Lines as suggested in the thread http://stackoverflow.com/questions/2298962/how-to-copy-marked-text-in-notepad, it just copied the first line of each match instead of copying all lines of each match. is there a way to copy all lines of each match? – Ponns Apr 24 '17 at 15:59
@Ponns: Sorry, I don't see a solution for that. You may find some help on Npp community: https://notepad-plus-plus.org/community/ – Toto Apr 25 '17 at 07:46
I'll followup on the npp forum. Thank you for your help, really appreciated. – Ponns Apr 26 '17 at 01:49

Notepad ++ reg expression to extract xml messages from log file

1 Answers1