1

I have want to match files located in multiple directories:

The file path could be locally - C:/users/path/image.png or on a system - //home/user/web/image.png

For the first case, I have regular expression -

[c|C]:[^.]+[.][A-Za-z]{3}

How can I have a single regex to match both of the cases?

Kevin
  • 217
  • 6
  • 19
  • I tried - image_path = re.findall(r"(([c|C]:)|(//home))[^.]+[.][A-Za-z]{3}", str(list1)) – Kevin Dec 04 '13 at 22:54

2 Answers2

2

Try

((c|C|//home)[^.]+[.][A-Za-z]{3})

Regular expression visualization

Debuggex Demo

If you want to use findall(), all the matches will be presented in a list of tuples. The tuples contain the groups in the regex, and that's the crux of the regex above - the whole expression has to be a group itself to show up in the return value of findall(). See the following code

smth = "//home/user/web/image.png C:/users/path/image.png c:/web/image.png"
ip = re.findall("((c|C|//home)[^.]+[.][A-Za-z]{3})",smth)
print ip
[('//home/user/web/image.png', '//home'), ('C:/users/path/image.png', 'C'), ('c:/web/image.png', 'c')]
arturomp
  • 28,790
  • 10
  • 43
  • 72
  • This (([c|C]:)|(//home))[^.]+[.][A-Za-z]{3} didn't work for me. I got result to be [('C:', 'C:', ''),('//home', '', '//home')] – Kevin Dec 04 '13 at 22:38
  • image_path = re.findall(r"(([c|C]:)|(//home))[^.]+[.][A-Za-z]{3}", str(list1)) – Kevin Dec 04 '13 at 22:40
2

What you're trying to get from the match is not clear - maybe you just want the full string?

((?:(?:[cC]:)|//home)[^\.]+\.[A-Za-z]{3})

A dot (.) will match (close to) everything. If you want to compare and contrast against the string ., you should escape it with \..

Test runs:

>>> print re.match("((?:(?:[cC]:)|//home)[^\.]+\.[A-Za-z]{3})", "//home/user/web/image.png").groups()
('//home/user/web/image.png',)

>>> print re.match("((?:(?:[cC]:)|//home)[^\.]+\.[A-Za-z]{3})", "C:/users/path/image.png").groups()
('C:/users/path/image.png',)

And one for the usual Windows path syntax:

>>> print re.match("((?:(?:[cC]:)|//home)[^\.]+\.[A-Za-z]{3})", "C:\users\path\image.png").groups()
('C:\\users\\path\\image.png',)

If there's a need to support .jpeg, increase the max allowed occurrences for the extensions from {3} to {3,4}.

planestepper
  • 3,277
  • 26
  • 38