-2

I'd like to match the urls like this:

input:

x = "https://play.google.com/store/apps/details?id=com.alibaba.aliexpresshd&hl=en"

get_id(x)

output:

com.alibaba.aliexpresshd

What is the best way to do it with re in python?

def get_id(toParse):
    return re.search('id=(WHAT TO WRITE HERE?)', toParse).groups()[0]

I found only the case with exactly one dot.

Павел Иванов
  • 1,863
  • 5
  • 28
  • 51

3 Answers3

1

You could try:

r'\?id=([a-zA-Z\.]+)'

For your regex, like so:

def get_id(toParse)
    regex = r'\?id=([a-zA-Z\.]+)'
    x = re.findall(regex, toParse)[0]
    return x

Regex -

By adding r before the actual regex code, we specify that it is a raw string, so we don't have to add multiple backslashes before every command, which is better explained here.

? holds special meaning for the regex system, so to match a question mark, we precede it by a backslash like \?
id= matches the id= part of the extraction
([a-zA-Z\.]+) is the group(0) of the regex, which matches the id of the URL. Hence, by saying [0], we are able to return the desired text.

Note - I have used re.findall for this, because it returns an array [] whose element at index 0 is the extracted text.

I recommend you take a look at rexegg.com for a full list of regex syntax.

Robo Mop
  • 3,485
  • 1
  • 10
  • 23
-1

Actually, you do not need to put anything "special" there.

Since you know that the bundle id is between id= and &, you can just capture whatever is inside and have your result in capture group like this:
id=(.+)&

So the code would look like this:

def get_id(toParse):
    return re.search('id=(.+)&', toParse).groups()[0]

Note: you might need to change the group index to "1", not "0", as most regex engines reserve this for full match. I'm not familiar how Python actually handles this.

See demo here

Asunez
  • 2,327
  • 1
  • 23
  • 46
-1

This regex should easily get what you want, it gets everything between id= and either the following parameter (.*? being ungreedy), or the end of the string.

id=(.*?)(&|$)

If you only need the id itself, it will be in the first group.

Adriano
  • 3,788
  • 5
  • 32
  • 53
Lance Toth
  • 430
  • 3
  • 17