0

I want to try to make some more types of regex, so I have been trying to make the following work.

Here is my expression: https://regex101.com/r/VzspFy/4/

On the test strings, the very first 3 are good, so patterns like that must be matched, the problem is the last one, which I don't want to be included, so I tried to do this:

https://regex101.com/r/9HVKTK/2

and this:

https://regex101.com/r/9HVKTK/1

But no luck!

The main idea is:

`aaa ... bbb ccc` -> must match
`ccc ... (aaa|ddd|eee) ... bbb ccc` -> should not match

How can I make it work or maybe some better implementation?

halfer
  • 19,824
  • 17
  • 99
  • 186
Jorman Franzini
  • 329
  • 1
  • 3
  • 17
  • Try [`^(?:(?!\b(eng|ita)\b).)*\K\beng\W+\w+\W+sub\s?ita\b`](https://regex101.com/r/VzspFy/6) – Wiktor Stribiżew Mar 22 '18 at 12:42
  • @LievenKeersmaekers I feel that OP needs to match a substring once if there is no specific substring before it. – Wiktor Stribiżew Mar 22 '18 at 12:47
  • So... basically, you want to match 'Eng xyz - Sub Ita', where `xyz` is some random three letters, right? – Robo Mop Mar 22 '18 at 13:09
  • (Aside: note that Stack Overflow is not a chatroom, so "thx", "wanna" and "lol" are not appropriate here. Use real words please - a technical standard of writing is preferred). – halfer Mar 22 '18 at 14:36
  • 1
    @halfer Sorry about my language, my english is not so good, sometimes I try to explain myself in a best way I can. I always try to learn some. – Jorman Franzini Mar 22 '18 at 18:48
  • @WiktorStribiżew Your solution seems to works, to me is like an extraterrestrial language. Now I add some text to verify all possible case that came in my mind. For me is very strange the aproach, I've to read carefully your solution, I can't figure out why works in such a simple code. I think my aproac was "complicated" – Jorman Franzini Mar 22 '18 at 18:51
  • @CoffeehouseCoder Basically I think so, but must also not have any ita before the 'Eng xyz - Sub Ita' – Jorman Franzini Mar 22 '18 at 18:58
  • @WiktorStribiżew Can I abuse of your experience? I found a patter that don't match, but must be. I tried here https://regex101.com/r/VzspFy/10 if you look the last line, don't match the pattern ENG.Sub.ITA I tried adding [A-Za-z_.-]? or [A-Za-z_.-]+ after the /K but don't work. Meantime I'm looking for a solution but I don't know why seems to be so complicate, in my mind is simple, need to capture also a dot. – Jorman Franzini Mar 22 '18 at 22:40
  • I went to sleep unusually early yesterday and could not see your comments. So, 1) my solution from top comment does not work. 2) I thought you need to match `eng...sub...ita` if there were no `eng` or `ita` whole words before them. Do you want to match that chain of words only if `eng` or `ita` do not appear after start of string or after `]`? Try [`(?:^|])(?:(?!\b(?:eng|ita)\b)[^]])*\K\beng(?:\W+\w+)?\W+sub\W+ita\b`](https://regex101.com/r/VzspFy/15) – Wiktor Stribiżew Mar 23 '18 at 07:50
  • @WiktorStribiżew works great thanks, the only problem now is that the \K is not compatible with the .net with perl incapsulation – Jorman Franzini Mar 23 '18 at 14:23
  • @JormanFranzini In .NET, it is even easier. – Wiktor Stribiżew Mar 23 '18 at 14:42

2 Answers2

1

You may use

var rx = new Regex(@"(?:^|])(?:(?!\b(?:eng|ita)\b)[^]])*\b(eng(?:\W+\w+)?\W+sub\W+ita)\b", RegexOptions.Compiled | RegexOptions.IgnoreCase);

See the regex demo. You need to get Group 1 values.

Pattern details

  • (?:^|]) - either start of string or ] (add | RegexOptions.Multiline if you have a multiline string as input, but I suppose these are all standalone strings)
  • (?:(?!\b(?:eng|ita)\b)[^]])* - any char but ], as many as possible, that does not start a whole word eng or ita (see tempered greedy token to understand this construct better)
  • \b - a word boundary
  • (eng(?:\W+\w+)?\W+sub\W+ita) - Group 1:
    • eng - a literal substring
    • (?:\W+\w+)? - an optional sequence of any 1+ non-word chars followed with 1+ word chars (actually, an optional word)
    • \W+ - 1+ non-word chars
    • sub - a literal substring
    • \W+ - 1+ non-word chars
    • ita - a literal substring
  • \b - a word boundary

See the C# demo:

var strs = new List<string> { 
        "Lucifer S03e15 [XviD - Eng Mp3 - Sub Ita Eng] DLRip By Pir8 [CURA] Fede e Religioni",
        "Lucifer S03e15 [XviD - Eng Mp3 - Sub Ita Eng] DLRip By Pir8 [CURA] Fede e Religioni",
        "Lucifer S03e01-08 [XviD - Eng Mp3 - Sub Ita Eng] DLRip By Pir8 [CURA] Fede e Religioni SEASON PREMIERE",
        "Young Sheldon S01e13 [SATRip 720p - H264 - Eng Ac3 - Sub Ita] HDTV by AVS",
        "Young Sheldon S01e08 [Mux 1080p - H264 - Ita Eng Ac3 - Sub Ita Eng] WEBMux Morpheus",
        "Young Sheldon S01e08 [Mux 1080p - H264 - Ita Eng Ac3 - Sub Ita Eng] WEBMux Morpheus",
        "Young Sheldon S01e14 [SATRip 720p - H264 - Eng Ac3 - Sub Ita] HDTV by AVS",
        "Lucifer S03e15 [XviD - Eng Mp3 - Sub Ita Eng] DLRip By Pir8 [CURA] Fede e Religioni",
        "Lucifer S03e16 [XviD - Eng Mp3 - Sub Ita Eng] DLRip By Pir8 [CURA] Fede e Religioni",
        "Lucifer S02e01-13 [XviD - Eng Mp3 - Sub Ita] DLRip by Pir8 [CURA] Fede e Religioni FULL ",
        "Absentia S01e01-10 [Mux 1080p - H264 - Ita Eng Ac3 - Sub Ita Eng] By Morpheus The.Breadwinner.2017.ENG.Sub.ITA.HDRip.XviD-[WEB]"
    };
var rx = new Regex(@"(?:^|])(?:(?!\b(?:eng|ita)\b)[^]])*\b(eng(?:\W+\w+)?\W+sub\W+ita)\b", RegexOptions.Compiled | RegexOptions.IgnoreCase);
foreach (var s in strs)
{
    Console.WriteLine(s);
    var result = rx.Match(s);
    if (result.Success)
        Console.WriteLine("Matched: {0}", result.Groups[1].Value);
    else
        Console.WriteLine("No match!");
    Console.WriteLine("==========================================");
}

Output:

Lucifer S03e15 [XviD - Eng Mp3 - Sub Ita Eng] DLRip By Pir8 [CURA] Fede e Religioni
Matched: Eng Mp3 - Sub Ita
==========================================
Lucifer S03e15 [XviD - Eng Mp3 - Sub Ita Eng] DLRip By Pir8 [CURA] Fede e Religioni
Matched: Eng Mp3 - Sub Ita
==========================================
Lucifer S03e01-08 [XviD - Eng Mp3 - Sub Ita Eng] DLRip By Pir8 [CURA] Fede e Religioni SEASON PREMIERE
Matched: Eng Mp3 - Sub Ita
==========================================
Young Sheldon S01e13 [SATRip 720p - H264 - Eng Ac3 - Sub Ita] HDTV by AVS
Matched: Eng Ac3 - Sub Ita
==========================================
Young Sheldon S01e08 [Mux 1080p - H264 - Ita Eng Ac3 - Sub Ita Eng] WEBMux Morpheus
No match!
==========================================
Young Sheldon S01e08 [Mux 1080p - H264 - Ita Eng Ac3 - Sub Ita Eng] WEBMux Morpheus
No match!
==========================================
Young Sheldon S01e14 [SATRip 720p - H264 - Eng Ac3 - Sub Ita] HDTV by AVS
Matched: Eng Ac3 - Sub Ita
==========================================
Lucifer S03e15 [XviD - Eng Mp3 - Sub Ita Eng] DLRip By Pir8 [CURA] Fede e Religioni
Matched: Eng Mp3 - Sub Ita
==========================================
Lucifer S03e16 [XviD - Eng Mp3 - Sub Ita Eng] DLRip By Pir8 [CURA] Fede e Religioni
Matched: Eng Mp3 - Sub Ita
==========================================
Lucifer S02e01-13 [XviD - Eng Mp3 - Sub Ita] DLRip by Pir8 [CURA] Fede e Religioni FULL 
Matched: Eng Mp3 - Sub Ita
==========================================
Absentia S01e01-10 [Mux 1080p - H264 - Ita Eng Ac3 - Sub Ita Eng] By Morpheus The.Breadwinner.2017.ENG.Sub.ITA.HDRip.XviD-[WEB]
Matched: ENG.Sub.ITA
==========================================
Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
  • 1
    Impressive! Works also with the last I added. I'll search more extra example to test it out but seems to be very good. I've to read carefully the logic of the expression because for me is a kind of magic! – Jorman Franzini Mar 23 '18 at 17:39
  • Hi, your solution seem the one that works best with all pattern. I'll try again in future, maybe will need some adjustment but I'm studying regex and I start to understand some, not like you but some – Jorman Franzini Mar 27 '18 at 11:38
  • @JormanFranzini :) Why not like me? I am sure you are becoming a pro if you understand this pattern. – Wiktor Stribiżew Mar 27 '18 at 11:43
0

Here's a relatively simple regex for your problem:

(?:(?<=[-]\s)(?:ITA\s)?\w{3}\s\w{3}\s[-]\s\w{3}\s\w{3}\s\w{3}\b)|(?:Eng\.sub\.ita)

which you can test out here.

REGEX:

(?<=[-]\s) is a positive look-behind, that makes sure that the match is preceded by a dash and a space (but doesn't match them)

(?:ITA\s)? is a non-capturing group which tells the regex that if the match is preceded by an "ITA" and space, then match them also.

\w{3} matches a string of three word characters (letters/numbers/underscore or a combination of them)

\s means a single space, and

[-] is just a fancy way of matching a single -.

|(?:Eng\.sub\.ita) tells the regex to match eng.sub.ita (case-insensitive) along with the original matches if present in a sentence together.

Do Note:

If the name of the show contains something along the lines of - red SEO - two one or 'dash-space-three_letters-space-three_letters-space-dash-space-three_letters-space-three_letters', then even the name of the show will be matched.

However, the likelihood of a show containing such a format is negligible, so you needn't worry about that.

Robo Mop
  • 3,485
  • 1
  • 10
  • 23
  • Works this too, thank you, but don't match with this Santa Clarita Diet S02e01 [Mux - XviD - Ita Eng Mp3 - Sub Ita Eng] NFMux By Pir8 [CURA]Mostri Nella Storia.... Is possible make it works with some extra char from start? – Jorman Franzini Mar 23 '18 at 14:25
  • @JormanFranzini What do you mean it doesn't work? It does [over here on regex101](https://regex101.com/r/VzspFy/16) – Robo Mop Mar 23 '18 at 14:36
  • @JormanFranzini Do you also want the eng part at the end of `... Sub Ita Eng`? – Robo Mop Mar 23 '18 at 14:36
  • Hi, sorry I write wrong, I want to say that match! My bad! But don't have to match because this is ita eng ... sub ita eng. I think match because of XviD - Ita Eng ... the - make the difference, I think. In meantime I study all the code was writed here, so much to learn! – Jorman Franzini Mar 23 '18 at 17:28
  • @JormanFranzini Don't worry about it. I have updated my code to fir your new conditions. Try it out :) – Robo Mop Mar 23 '18 at 17:36
  • @JormanFranzini This new regex should match your needs now. Be sure to check it out. Also, if you feel that this answer has helped you, be sure to mark it as helpful, for further reference. – Robo Mop Mar 23 '18 at 17:50
  • Tnx for you time and patience, but seems don't match all results, I don't know why, right now, but one that is the same to other don't match – Jorman Franzini Mar 27 '18 at 11:42