Regular Expression Python Trace back

Question

I have a string something like this:

opt/custom/building/BuildingInput/address/BuildingUnderwritingInput/Name

I need to catch all the words having 'Input' and delete them from the path. So my final string will be:

opt/custom/building/address/Name

I tried something like this but it didnt work

x = "opt/custom/building/BuildingInput/address/BuildingUnderwritingInput/Name"
re.sub(r'Input/', r'/' , x.rstrip())

And it gave me

opt/custom/building/Building/address/BuildingUnderwriting/Name

The "Building" of "BuildingInput" and "BuildingUnderwriting" of "BuildingUnderwritingInput" are retained here. I want the whole word 'BuildingInput" and "BuildingUnderwritingInput" to be omitted. Any help? Or if anyone can tell me how I can backtrace from occurrence of "Input" to the first occurrence of "/" so that I can match the whole word "BuildingInput" and "BuildingUnderwritingInput"

score 1 · Accepted Answer · edited May 02 '23 at 10:00

1

Use this regex to remove all words ending with Input within slashes (/):

(/)[^/]+Input(?=/)

For your case:

x = "opt/custom/building/BuildingInput/address/BuildingUnderwritingInput/Name"
re.sub(r'(/)[^/]+Input(?=/)', r'' , x.rstrip())

edited May 02 '23 at 10:00

Abdul Aziz Barkat

19,475
3
20
33

answered May 02 '17 at 07:59

degant

4,861
1
17
29

Thanks a lot for your quick response. It worked! Can you tell me what (/) and (?=/) did? – Bhaskar Bhuyan May 02 '17 at 08:02
Sure. (/) just represents a / at the beginning. (?=) is a positive lookahead operator which ensures that (/) is present ahead of Input but isn't considered in the regex match. So essentially we look for a word ending with Input and slash(/) but don't include the slash (/) in the regex so that it isn't deleted. You can read up more about lookarounds [here](http://stackoverflow.com/questions/2973436/regex-lookahead-lookbehind-and-atomic-groups) – degant May 02 '17 at 08:05
No problem :) If this answer helped solve your problem, you might want to mark it as accepted answer. Thanks! – degant May 02 '17 at 08:09

score 0 · Answer 2 · answered May 02 '17 at 08:01

0

Currently you are only searching and replacing Input/, you have to search for the whole word, for example by using this regex:

re.sub(r'/\w*Input/', r'/' , x.rstrip())

answered May 02 '17 at 08:01

Christian König

3,437
16
28

A Person · Answer 3 · 2017-05-02T08:23:17.387

Remove 0 or more chars that are not a slash ([^/]*) till after the point that Input followed by a slash appears:

import re
x = "opt/custom/building/BuildingInput/address/BuildingUnderwritingInput/Name"
print(re.sub(r'[^/]*Input/', r'' , x.rstrip()))

If it is possible that the last element of the path also contains an Input word (without a trailing slash) you can use this instead:

x = "address/BuildingUnderwritingInput"
print(re.sub(r'[^/]*Input(/|$)', r'' , x.rstrip()))

Here either / or the end of the string ($) match after Input. However this leaves one slash if the last word is matched. If this is a problem you can remove it seperately:

x = "address/BuildingUnderwritingInput"
x = re.sub(r'[^/]*Input(/|$)', r'' , x.rstrip())
print(re.sub(r'/$', r'' , x))

Regular Expression Python Trace back

3 Answers3