2

Hello i am a newbie and currently trying to learn about regex pattern by experimenting on various patterns. I tried to create the regex pattern for this url but failed. It's a pagination link of amazon.

http://www.amazon.in/s/lp_6563520031_pg_2?rh=n%3A5866078031%2Cn%3A%215866079031%2Cn%3A6563520031&page=2s&ie=UTF8&qid=1446802571

Or

http://www.amazon.in/Tena-Wet-Wipe-Pulls-White/dp/B001O1G242/ref=sr_1_46?s=industrial&ie=UTF8&qid=1446802608&sr=1-46

I just want to check the url by only these two things.

  1. If the url has dp directory or product directory

  2. If the url has query string page having any digit

I tried to create the regex pattern but failed. I want that if the first thing is not there the regex pattern should match the second (or vice versa).

Here's the regex pattern I made:

.*\/(dp|product)\/ | .*page

Here is my regex101 link: https://regex101.com/r/zD2gP5/1#python

Remi Guan
  • 21,506
  • 17
  • 64
  • 87
Utkarsh Dixit
  • 4,267
  • 3
  • 15
  • 38

2 Answers2

3

Since you just want to check if a string contains some pattern, you can use

\/(?:dp|product)\/|[&?]page=

See regex demo

In Python, just check with re.search:

import re
p = re.compile(r'/(?:dp|product)/|[&?]page=')
test_str = "http://w...content-available-to-author-only...n.in/s/lp_6563520031_pg_2?rh=n%3A5866078031%2Cn%3A%215866079031%2Cn%3A6563520031&page=2s&ie=UTF8&qid=14468025716"
if p.search(test_str):
    print ("Found!")

Also, in Python regex patterns, there is no need to escape / slashes.

The regex matches two alternative subpatterns (\/(?:dp|product)\/ and [&?]page=):

  • / - a forward slash
  • (?:dp|product) - either dp or product (without storing the capture inside the capture buffer since it is a non-capturing group)
  • / - a slash
  • | - or...
  • [&?] - either a & or ? (we check the start of a query string parameter)
  • page= - literal sequence of symbols page=.
Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
2
\/(dp|product)\/|page=(?=[^&]*\d)[^&]+

This would be my idea, please test it and let me know if you have question about.

Utkarsh Dixit
  • 4,267
  • 3
  • 15
  • 38
zolo
  • 444
  • 2
  • 6
  • Try to run this one regex101.com . Your previous query /(dp|product)/|page=(?=[^&]*\d)[^&]+. You just forget to escape the slashes. But Thanks – Utkarsh Dixit Dec 25 '15 at 17:50