2

Not sure if this is something that should be a bounty. II just want to understand regex better.

I checked the responses in the Regex to match pattern.one skip newlines and characters until pattern.two and Regex to match if given text is not found and match as little as possible threads and read about Tempered Greedy Token Solutions and Explicit Greedy Alternation Solutions on RexEgg, but admittedly the explanations baffled me.

I spent the last day fiddling mainly with re.sub (and with findall) because re.sub's behaviour is odd to me.

.

Problem 1:

Given Strings below with characters followed by / how would I produce a SINGLE regex (using only either re.sub or re.findall) that uses alternating capture groups which must use [\S]+/ to get the desired output

>>> string_1 = 'variety.com/2017/biz/news/tax-march-donald-trump-protest-1202031487/'
>>> string_2 = 'variety.com/2017/biz/the/life/of/madam/green/news/tax-march-donald-trump-protest-1202031487/'
>>> string_3 = 'variety.com/2017/biz/the/life/of/news/tax-march-donald-trump-protest-1202031487/the/days/of/our/lives'

Desired Output Given the Conditions(!!)

tax-march-donald-trump-protest-

CONDITIONS: Must use alternating capture groups which must capture ([\S]+) or ([\S]+?)/ to capture the other groups but ignore them if they don't contain -

I'M WELL AWARE that it would be better to use re.findall('([\-]*(?:[^/]+?\-)+)[\d]+', string) or something similar but I want to know if I can use [\S]+ or ([\S]+) or ([\S]+?)/ and tell regex that if those are captured, ignore the result if it contains / or doesn't contain - While also having used an alternating capture group

I KNOW I don't need to use [\S]+ or ([\S]+) but I want to see if there is an extra directive I can use to make the regex reject some characters those two would normally capture.

FailSafe
  • 482
  • 4
  • 12
  • Comments are not for extended discussion; this conversation has been [moved to chat](https://chat.stackoverflow.com/rooms/190726/discussion-on-question-by-failsafe-forkingpython-regex-re-sub-and-re-findall). – Samuel Liew Mar 26 '19 at 22:04

2 Answers2

2

Posted per request:

(?:(?!/)[\S])*-(?:(?!/)[\S])*

https://regex101.com/r/azrwjO/1

Explained

 (?:                           # Optional group
      (?! / )                       # Not a forward slash ahead
      [\S]                          # Not whitespace class
 )*                            # End group, do 0 to many times
 -                             # A dash must exist
 (?:                           # Optional group,  same as above
      (?! / )
      [\S] 
 )*
1

You could use

/([-a-z]+)-\d+

and take the first capturing group, see a demo on regex101.com.

Jan
  • 42,290
  • 8
  • 54
  • 79
  • Thanks. I know I can use that, but I really want to force the use of "([\S]+?)/" and force it to exclude anything captured that doesn't contain "-" using a single regex statement. I know I don't even need to use "[\S]+?", but I want it there to see if I can use an extra directive in regex to force it to drop some captures that [\S]+ would normally find. But yes, I want to force it to use "([\S]+?)/" – FailSafe Mar 26 '19 at 18:13