0

Context;

After running the following command on my server:

zgrep "ResCode-5005" /loggers1/PCRF*/_01_03_2022 > analisis.txt

I get a text file with thousands of lines like this example:

loggers1/PCRF1_17868/PCRF12_01_03_2022_00_15_39.log:[C]|01-03-2022:00:18:20:183401|140404464875264|TRACKING: CCR processing Compleated for SubId-5281181XXXXX, REQNO-1, REQTYPE-3, SId-mscp01.herpgwXX.epc.mncXXX.mccXXX.XXXXX.org;25b8510c;621dbaab;3341100102036XX-27cf0XXX, RATTYPE-1004, ResCode-5005 |processCCR|ProcessingUnit.cpp|423

(X represents incrementing numbers)

Problem:

The output is filled with unnecessary data. The only string portions I need are the MSISDN,IMSI comma separated for each line, like this:

5281181XXXXX,3341100102036XX

Steps I tried

zgrep "ResCode-5005" /loggers1/PCRF*/_01_03_2022| grep -o -P '(?<=SubId-).*?(?=, REQ)' > analisis1.txt

This gave me the first part of the solution

5281181XXXXX

However, when I tried to get the second string located between '334110' and "-"

zgrep "ResCode-5005" /loggers1/PCRF*/_01_03_2022| grep -o -P '(?<=SubId-).?(?=, REQ)' | grep -o -P '(?<=334110).?(?=-)' > analisis1.txt

it doesn't work.

Any input will be appreciated.

Felipe La Rotta
  • 343
  • 3
  • 13

1 Answers1

2

To get 5281181XXXXX or the second string located between '334110' and "-" you can use a pattern like:

\b(?:SubId-|334110)\K[^,\s-]+

The pattern matches:

  • \b A word boundary to prevent a partial word match
  • (?: Non capture group to match as a whole
    • SubId- Match literally
    • | Or
    • 334110 Match literally
  • ) Close the non capture group
  • \K Forget what is matched so far
  • [^,\s-]+ Match 1+ occurrences of any char except a whitespace char , or -

See the matches in this regex demo.

That will match:

5281181XXXXX
0102036XX

The command could look like

zgrep "ResCode-5005" /loggers1/PCRF*/_01_03_2022 | grep -oP '\b(?:SubId-|334110)\K[^,\s-]+' > analisis1.txt
The fourth bird
  • 154,723
  • 16
  • 55
  • 70
  • Thank you for the excellent answer. I wonder how could I concatenate the output of the regex to get both the 5281181XXXXX and 3341100102036XX portions in one line, comma separated. I saw examples using scripts, but maybe there is a more elegant way inside the regex black magic book. – Felipe La Rotta Mar 02 '22 at 16:28
  • 1
    @FelipeLaRotta From what I can see for example on [this page](https://stackoverflow.com/questions/15580144/how-to-concatenate-multiple-lines-of-output-to-one-line) you could pipe the output to `| tr '\n' ','` so I think the command would be `zgrep "ResCode-5005" /loggers1/PCRF*/_01_03_2022 | grep -oP '\b(?:SubId-|334110)\K[^,\s-]+' | tr '\n' ',' > analisis1.txt` Or you can pipe it to awk `| awk '{print}' ORS=', '` – The fourth bird Mar 02 '22 at 20:41