5

I want to get String before last occurrence of my given sub string.

My String was,

path = D:/me/vol101/Prod/cent/2019_04_23_01/image/AVEN_000_3400_img_pic_p1001-1010/pxy/AVEN_000_3400_img-mp4_to_MOV_v1001-1010.mov

my substring, 1001-1010 which will occurred twice. all i want is get string before its last occurrence.

Note: My substring is dynamic with different padding but only number.

I want,

D:/me/vol101/Prod/cent/2019_04_23_01/image/AVEN_000_3400_img_pic_p1001-1010/pxy/AVEN_000_3400_img-mp4_to_MOV_v

I have done using regex and slicing,

>>> p = 'D:/me/vol101/Prod/cent/2019_04_23_01/image/AVEN_000_3400_img_pic_p1001-1010/pxy/AVEN_000_3400_img-mp4_to_MOV_v1001-1010.mov'
>>> q = re.findall("\d*-\d*",p)
>>> q[-1].join(p.split(q[-1])[:-1])
'D:/me/vol101/Prod/cent/2019_04_23_01/image/AVEN_000_3400_img_pic_p1001-1010/pxy/AVEN_000_3400_img-mp4_to_MOV_v'
>>> 

Is their any better way to do by purely using regex?

Please Note I have tried so many eg:

  1. regular expression to match everything until the last occurrence of /
  2. Regex Last occurrence?

I got answer by using regex with slicing but i want to achieve by using regex alone..

Mohideen bin Mohammed
  • 18,813
  • 10
  • 112
  • 118

3 Answers3

8

Why use regex. Just use built in string methods:

path = "D:/me/vol101/Prod/cent/2019_04_23_01/image/AVEN_000_3400_img_pic_p1001-1010/pxy/AVEN_000_3400_img-mp4_to_MOV_v1001-1010.mov"
index = path.rfind("1001-1010")
print(path[:index])
  • looks good.. nice example without regex.. but problem is i dont know substring.. substring will be some number with padding.. – Mohideen bin Mohammed Apr 25 '19 at 14:55
  • 2
    Yes this approach will work only with a static pattern but if keyword is unknown and a pattern search is required then regex is must. – anubhava Apr 25 '19 at 14:58
3

You can use a simple greedy match and a capture group:

(.*)1001-1010

Your match is in capture group #1

Since .* is greedy by nature, it will match longest match before matching your keyword 1001-1010.

RegEx Demo


As per comments below if keyword is not a static string then you may use this regex:

r'(.*\D)\d+-\d+'

Python Code:

>>> p = 'D:/me/vol101/Prod/cent/2019_04_23_01/image/AVEN_000_3400_img_pic_p1001-1010/pxy/AVEN_000_3400_img-mp4_to_MOV_v1001-1010.mov'
>>> print (re.findall(r'(.*\D)\d+-\d+', p))
['D:/me/vol101/Prod/cent/2019_04_23_01/image/AVEN_000_3400_img_pic_p1001-1010/pxy/AVEN_000_3400_img-mp4_to_MOV_v']
anubhava
  • 761,203
  • 64
  • 569
  • 643
0

Thanks @anubhava,

My first regex was,

.*(\d*-\d*)\/

Now i have corrected mine..

.*(\d*-\d*)

or

(.*)(\d*-\d*)

which gives me,

>>> q = re.search('.+(\d*-\d*)', p)
>>> q.group()
'D:/me/vol101/Prod/cent/2019_04_23_01/image/AVEN_000_3400_img_pic_p1001-1010/pxy/AVEN_000_3400_img-mp4_to_MOV_v0001-1001'
>>> 

(.*\D)\d+-\d+

this gives me exactly what i want...

>>> q = re.search('(.*\D)\d+-\d+', p)
>>> q.groups()
('D:/me/vol101/Prod/cent/2019_04_23_01/image/AVEN_000_3400_img_pic_p1001-1010/pxy/AVEN_000_3400_img-mp4_to_MOV_v',)
>>> 
Mohideen bin Mohammed
  • 18,813
  • 10
  • 112
  • 118