-1

I'm looking to extract only what appears after '/g' and before the '+' or '?'

urls = ["https://www.google.com/es/g/Dmitry+Kharchenko?searchterm=isometrico",
       "https://www.google.com/es/g/Irina+Strelnikova?searchterm=isom%C3%A9trico",
       "https://www.google.com/es/g/ParabolStudio?searchterm=auto"]

for i in urls:
    print(re.findall(r'g/(.*)[\+|\??]', i))


['Dmitry+Kharchenko']
['Irina+Strelnikova']
['ParabolStudio']

Desired result:

'Dmitry'
'Irina'
'ParabolStudio'
Raymont
  • 283
  • 3
  • 16

1 Answers1

0

You need to use non-greedy pattern .*? which will match up to the first + or ? it encountered instead of the last + or ? in greedy case, i.e. .*; To match + or ? with character class you can just do [+?]:

for i in urls:
    print(re.findall(r'g/(.*?)[+?]', i))

# ['Dmitry']
# ['Irina']
# ['ParabolStudio']
Psidom
  • 209,562
  • 33
  • 339
  • 356