4

my first Q here.

I have a log file that has multiple similar strings as hits:

Region: AR
OnlineID: Atl_Tuc
---Start---
FIFA 18 Legacy Edition
---END---

Region: FR
OnlineID: jubtrrzz
---Start---
FIFA 19
Undertale
Pro Evolution Soccer™ 2018
---END---

Region: US
OnlineID: Cu128yi
---Start---
KINGDOM HEARTS HD 1.5 +2.5 ReMIX
---END---

Region: RO
OnlineID: Se116
---Start---
Real Farm
EA SPORTS™ FIFA 20
LittleBigPlanet™ 3
---END---

Region: US
OnlineID: CAJ5Y
---Start---
Madden NFL 18: G.O.A.T. Super Bowl Edition
---END---

I wanna find all hits which contain fifa (fifa as a string). Fifa is example, I need to find all hits which contain some strings.

The last thing I could find is this regex: (?s)(?=^\r\n)(.*?)(fifa)(.*?)(?=\r\n\r\n)

But when I use this, it selects all hits including hits with no fifa, until it finds a fifa in a hit, so it selects more than 1 hit sometimes like this.

Second problem is I can't use .* in (fifa) bcz it causes wrong selection.

What can I do now?

The right output should be like this:

Region: AR
OnlineID: Atl_Tuc
---Start---
FIFA 18 Legacy Edition
---END---

Region: FR
OnlineID: jubtrrzz
---Start---
FIFA 19
Undertale
Pro Evolution Soccer™ 2018
---END---

Region: RO
OnlineID: Se116
---Start---
Real Farm
EA SPORTS™ FIFA 20
LittleBigPlanet™ 3
---END---
morez890
  • 57
  • 6

2 Answers2

3

You can use

(?si)(?:^(?<!.)|\R{2})\K(?:(?!\R{2}).)*?\bfifa\b.*?(?=\R{2}|\z)

See the regex demo

Details

  • (?si) - s makes . match line break chara (same as . matches newline ON) and case insensitive matching ON
  • (?:^(?<!.)|\R{2}) - matches start of a file or two line break sequences
  • \K - omits the matched line breaks
  • (?:(?!\R{2}).)*? - any char, 0 or more occurrences but as few as possible, not starting a double line break sequence
  • \bfifa\b - whole word fifa
  • .*? - any 0+ chars as few as possible
  • (?=\R{2}|\z) - up to the double line break or end of file.

Now, if you want to match a paragraph with fifa and then 20 on some of its line, use

(?si)(?:^(?<!.)|\R{2})\K(?:(?!\R{2}).)*?(?-s:\bfifa\b.*\b20\b).*?(?=\R{2}|\z)

The (?-s:\bfifa\b.*\b20\b) is a modifier group where . stops matching line breaks, and it matches a whole word fifa, then any 0+ chars other than line break chars, as many as possible, and then a 20 as a whole word.

See this regex demo.

Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
  • It is exactly what I was searching for. And now can I use another start or end for this regex? I mean using a string instead of two line break sequences for start or end of the selection ( `(?:^(?<!.)|\R{2})` ) – morez890 Nov 15 '20 at 01:17
  • 1
    @morez890 Suit yourself. I do not have access to your data. – Wiktor Stribiżew Nov 15 '20 at 01:20
  • 1
    Thanks for the solution, I was searching for this like 1 month on the internet. – morez890 Nov 15 '20 at 01:32
  • In some cases I have two line break at start of the hit: ` Language: pt-BR Region: FR OnlineID: jubtrrzz ---Start--- FIFA 19 Undertale Pro Evolution Soccer™ 2018 ---END--- ` Anything I can do with this? To select language line + line break + main hit , as my whole selection. [image for this](https://imgur.com/8Iyh9wU) – morez890 Nov 15 '20 at 01:58
  • @morez890 You already have it. The delimiters are *double linebreaks*. – Wiktor Stribiżew Nov 15 '20 at 02:00
  • No no, there is double linebreak in some of hits, that they should be with each other in 1 selection. Added image to previous comment. I mean another regex that can select: "Start + a line of writing + an empty line + my hit + end of selection" – morez890 Nov 15 '20 at 02:04
  • @morez890 I am not sure you are now speaking of the same regex or a new one. If you need to improve the above one, add `^Language:\h*[a-zA-Z-]+\R` to the first group, see [this demo](https://regex101.com/r/S8Kg6x/4). – Wiktor Stribiżew Nov 15 '20 at 02:09
  • Yeah, this is what I need but again, how can I use `^Language:.*\R` that `.*` here? Bcz this line might include char or number also. – morez890 Nov 15 '20 at 02:23
  • 1
    `^Language:(?-s:.*)\R` – Wiktor Stribiżew Nov 15 '20 at 02:27
  • Thank you very much, all solutions worked well. And do you have any idea why we can't use some regex in windows cmd? Like the one we are talking about. Cmd (batch file) just skips the regex and not working, same for many other regex. – morez890 Nov 15 '20 at 02:46
  • 1
    @morez890 Not sure what you mean, but you can [work around](https://stackoverflow.com/questions/34524390/regex-to-match-a-variable-in-batch-scripting) some things in bat files. – Wiktor Stribiżew Nov 15 '20 at 11:33
1

It would be better not to use regex for this entire problem. I would use something simpler to cut the log file into pieces, 1 piece per paragraph.

Then use a regex to see if each paragraph is a "hit" or not.

Here is some Python code:

# read the file contents into a string
log_text = open('/input/log/file/path/here', 'r').read().strip()

# split the string into separate paragraphs
paragraphs = log_text.split('\n\n')

# filter the paragraphs to the ones you want
filtered_paragraphs = filter(is_wanted, paragraphs)

# recombine the filtered paragraphs into a new log string
new_log_text = '\n\n'.join(filtered_paragraphs)

# output new log text into new file
open('/output/log/file/path/here', 'w').write(new_log_text)

and of course you will need to define the is_wanted function:

import re

def is_wanted(paragraph):
    # discard first three and last line to get paragraph content
    p_content = '\n'.join(paragraph.split('\n')[3:-1])
    # input any regex pattern here, such as 'FIFA'.  You can pass it into the function as a variable if you need it to be customizable
    return bool(re.search(r'FIFA', p_content))
mareoraft
  • 3,474
  • 4
  • 26
  • 62
  • 1
    Thanks, I don't know if it works or not, but I wanna use regex in batch file, that's why I asked for regex. – morez890 Nov 15 '20 at 01:33
  • 1
    No worries! I have verified that it works, but I understand that you may not want to use Python for your particular situation. Have a great day and welcome to Stack Overflow! – mareoraft Nov 15 '20 at 01:35