Not sure if this is something that should be a bounty. II just want to understand regex better.
I checked the responses in the Regex to match pattern.one skip newlines and characters until pattern.two and Regex to match if given text is not found and match as little as possible threads and read about Tempered Greedy Token Solutions
and Explicit Greedy Alternation Solutions
on RexEgg, but admittedly the explanations baffled me.
I spent the last day fiddling mainly with re.sub (and with findall) because re.sub's behaviour is odd to me.
.
Problem 1:
Given Strings below with characters followed by /
how would I produce a SINGLE regex (using only either re.sub or re.findall) that uses alternating capture groups which must use [\S]+/
to get the desired output
>>> string_1 = 'variety.com/2017/biz/news/tax-march-donald-trump-protest-1202031487/'
>>> string_2 = 'variety.com/2017/biz/the/life/of/madam/green/news/tax-march-donald-trump-protest-1202031487/'
>>> string_3 = 'variety.com/2017/biz/the/life/of/news/tax-march-donald-trump-protest-1202031487/the/days/of/our/lives'
Desired Output Given the Conditions(!!)
tax-march-donald-trump-protest-
CONDITIONS: Must use alternating capture groups which must capture ([\S]+)
or ([\S]+?)/
to capture the other groups but ignore them if they don't contain -
I'M WELL AWARE that it would be better to use re.findall('([\-]*(?:[^/]+?\-)+)[\d]+', string)
or something similar but I want to know if I can use [\S]+
or ([\S]+)
or ([\S]+?)/
and tell regex that if those are captured, ignore the result if it contains /
or doesn't contain -
While also having used an alternating capture group
I KNOW I don't need to use [\S]+
or ([\S]+)
but I want to see if there is an extra directive I can use to make the regex reject some characters those two would normally capture.