0

I am a beginner for the regular expressions pattern matching in python. please help me to solve this problem.

I want to extract some texts from the given string. Please check the below example.

String : "keyword//match1/match2/more_text_with_/_more_and_more_/_texts"

I need to extract "match1" and "match2"

I wrote the following python code to do that...

import re
astr = 'keyword//match1/match2/more_text_with_/_more_and_more_/_texts'
match = re.search('keyword//(.*)/(.*)/.*', astr)

print("match1 : ", match.group(1))
print("match2 : ", match.group(2))

The result is...

match1 :  match1/match2/more_text_with_                                                                     
match2 :  _more_and_more_ 

I read about "How Regex Engine Works" from here https://www.regular-expressions.info/engine.html

And I can understand why this result comes. But I have no idea to write a regular expression to get my required matching texts.

Please help me with this.

Thank you very much,

juanpa.arrivillaga
  • 88,713
  • 10
  • 131
  • 172
Dilanka Rathnayake
  • 660
  • 2
  • 11
  • 32

4 Answers4

0

.* is greedy and will match as many characters as possible, instead you could use .*? which will match as few characters as possible.

import re

astr = 'keyword//match1/match2/more_text_with_/_more_and_more_/_texts'
match = re.search(r'keyword//(.*?)/(.*?)/.*?', astr)

print("match1 : ", match.group(1))
print("match2 : ", match.group(2))
0

Another way not using the non-greedy .*? (See answer posted by @NegativeChameleon):

match = re.search(r'keyword//([^\/]*)/([^\/]*)/', astr)

[^\/]* says to match 0 or more characters that are not a '/' and you can be as greedy as you want!

import re

astr = 'keyword//match1/match2/more_text_with_/_more_and_more_/_texts'
match = re.search(r'keyword//([^\/]*)/([^\/]*)/', astr)

print("match1:", match.group(1))
print("match2:", match.group(2))
Booboo
  • 38,656
  • 3
  • 37
  • 60
0

look at this link

import re
astr = 'keyword//match1/match2/more_text_with_/_more_and_more_/_texts'
match = re.search('keyword//([^/]*)/([^/]*)/.*/', astr)

print("match1 : ", match.group(1))
print("match2 : ", match.group(2))

output:

('match1 : ', 'match1')
('match2 : ', 'match2')

base on link

Shahab Rahnama
  • 982
  • 1
  • 7
  • 14
0
re.findall('(.+?)/',s.replace('keyword//',''))[0:2]

would also work

examples:

s = 'keyword//match1/match2/more_text_with_/_more_and_more_/_texts'

output:

['match1', 'match2']

s = 'keyword//ma13*41$?tch1/mad4#$(#01tch2/more_text_with_/_more_and_more_/_texts'

output:

['ma13*41$?tch1', 'mad4#$(#01tch2']
Derek Eden
  • 4,403
  • 3
  • 18
  • 31