2
^([a-z-]+-on-sale(?:,[a-z-]+-on-sale){0,})[\/]$

This regex is used in a htaccess file and matches a pattern such as this one:

tools-on-sale,candy-on-sale,food-on-sale/

I've been wondering whether it's possible or not for me to capture a subsection of a repeated capture group. I want to match the same pattern, but I want to omit the "-on-sale" part in the repeated capture group. I know I can already do this for the first part of the regex:

^(([a-z-]+)-on-sale(?:,[a-z-]+-on-sale){0,})[\/]$

That way I have "tools" isolated in its own capture group, but I can't seem to do with the same with the second part. Is this even doable with a regex?

Sefam
  • 1,712
  • 2
  • 23
  • 40

2 Answers2

1

If I think I understand you, you want to get a list of whats on-sale ?

You've already figured out how to capture the first one tools.
But, You need this in a single match.

The good news is that only Dot-Net can do this in a capture collection,
like this:

 # ^((?:(?:^|(?<!^),)(?<sale_item>[a-z-]+)-on-sale)+)[\/]$

 ^     
 (                             # (1 start)
      (?:
           (?:
                ^ 
             |  (?<! ^ )
                , 
           )
           (?<sale_item> [a-z-]+ )       # (2)
           -on-sale 
      )+
 )                             # (1 end)
 [\/] $

where sale_item is a list.

The bad news is that on all other regular expression engines,
the overall match will be the same, but the sale_item capture buffer
is overwritten each iteration of the quantified group.
So, sale_item will contain only the last item "food".

  • Not sure I understand, and no, I'm using Apache and PHP, not Dot-Net. The "on-sale" (It's not actually what it is, it's an example) part was originally added to "help with SEO", and I'm trying to get entirely rid of the "-on-sale" part to do a 301 redirect to a new URL that doesn't contain "-on-sale". The problem is that currently, I'm capturing it and I don't want it. I still want to match it, but I want to be able to capture without it. Otherwise I'm stuck processing the URL in PHP, and do a 301 redirect manually. – Sefam Nov 30 '15 at 05:43
  • @Sefam - `I'm capturing it and I don't want it. I still want to match it, but I want to be able to capture without it` You have answered your own question. The problem is you are using a quantified group to match successive occurances. Capture groups are singular. If I use `(hello)+` on `hellohellohello` I don't get `['hello','hello','hello']` I get ['hello']. You can do aggregation with Dot-Net only. Otherwise you'd have to make a unique group in a finite expression. See @Mariano answer. –  Dec 01 '15 at 04:21
1

There is not a short way to achieve this. However, you could define a maximimum number of items you should expect, and create one optional group for each.

For 1 to 3 items:

^([a-z-]+)-on-sale(?:(,[a-z-]+)-on-sale(?:(,[a-z-]+)-on-sale)?)?/$

Request url

http://foo.bar/tools-on-sale,candy-on-sale,food-on-sale/

htaccess

RewriteRule ^([a-z-]+)-on-sale(?:(,[a-z-]+)-on-sale(?:(,[a-z-]+)-on-sale)?)?/$ http://foo.bar/$1$2$3 [L]

*Thanks to @sln for suggesting an improvement

Output url

http://foo.bar/tools,candy,food

However, if you need a delimiter other than commas, this will generate empty tokens if you have less than 3 items. E.g:

http://foo.bar/tools--

If you must avoid it, you need to create 1 rule for each number of items:

RewriteRule ^([a-z-]+)-on-sale,([a-z-]+)-on-sale,([a-z-]+)-on-sale/$ http://foo.bar/$1-$2-$3 [L]
RewriteRule ^([a-z-]+)-on-sale,([a-z-]+)-on-sale/$ http://foo.bar/$1-$2 [L]
RewriteRule ^([a-z-]+)-on-sale/$ http://foo.bar/$1 [L]
Community
  • 1
  • 1
Mariano
  • 6,423
  • 4
  • 31
  • 47
  • 1
    You could probably use one regex and not get empty tokens by putting the comma inside the capture group `RewriteRule ^([a-z-]+)-on-sale(?:(,[a-z-]+)-on-sale(?:(,[a-z-]+)-on-sale)?)?/$ http://foo.bar/$1$2$3 [L]` –  Dec 01 '15 at 06:07
  • @sln Nice! I didn't see it. Thank you. – Mariano Dec 01 '15 at 06:21