regex for finding file paths

Question

I used this regex(\/.*\.[\w:]+) to find all file paths and directories. But in a line like this "file path /log/file.txt some lines /log/var/file2.txt" which contains two paths in the same line , it does not select the paths individually , rather , it selects the whole line. How to solve this?

I think this is what you want: https://docs.python.org/2/howto/regex.html#greedy-versus-non-greedy — Jonas, May 31 '18 at 06:42
See my answer, @Sriram, if you want to find all paths use re.findall() — Jonas, May 31 '18 at 06:48

score 8 · Accepted Answer · answered May 31 '18 at 06:45

8

Use regex(\/.*?\.[\w:]+) to make regex non-greedy. If you want to find multiple matches in the same line, you can use re.findall().

Update: Using this code and the example provided, I get:

import re
re.findall(r'(\/.*?\.[\w:]+)', "file path /log/file.txt some lines /log/var/file2.txt")
['/log/file.txt', '/log/var/file2.txt']

answered May 31 '18 at 06:45

Jonas

1,473
2
13
28

Great! Please accept the answer so that the question is closed if your problem is solved. @Sriram – Jonas May 31 '18 at 06:51
Files don't have extensions all the time, to catch files without extension you can use r'(\/[^\s\n]+)+' – Gal Shahar Jan 20 '21 at 08:38

The fourth bird · Answer 2 · 2018-05-31T07:14:41.210

Your regex (\/.*\.[\w:]+) uses .* which is greedy and would match [\w:]+ after the last dot in file2.txt. You could use .*? instead.

But it would also match /log////var////.txt

As an alternative you might use a repeating non greedy pattern that would match the directory structure (?:/[^/]+)+? followed by a part that matches the filename /\w+\.\w+

(?:/[^/]+)+?/\w+\.\w+

import re
s = "file path /log/file.txt some lines /log/var/file2.txt or /log////var////.txt"
print(re.findall(r'(?:/[^/]+)+?/\w+\.\w+', s))

That would result in:

['/log/file.txt', '/log/var/file2.txt']

Demo

score 3 · Answer 3 · edited May 31 '18 at 07:48

3

You can use python re

something like this:

import re
msg="file path /log/file.txt some lines /log/var/file2.txt"
matches = re.findall("(/[a-zA-Z\./]*[\s]?)", msg)
print(matches)

Ref: https://docs.python.org/2/library/re.html#finding-all-adverbs

edited May 31 '18 at 07:48

Daniel Puiu

962
6
21
29

answered May 31 '18 at 06:57

Chavali Kalyan

31
2

regex for finding file paths

3 Answers3

Linked