I used this regex(\/.*\.[\w:]+)
to find all file paths and directories. But in a line like this "file path /log/file.txt some lines /log/var/file2.txt"
which contains two paths in the same line , it does not select the paths individually , rather , it selects the whole line. How to solve this?
Asked
Active
Viewed 2.7k times
6

Brown Bear
- 19,655
- 10
- 58
- 76

Sriram
- 97
- 1
- 1
- 8
-
I think this is what you want: https://docs.python.org/2/howto/regex.html#greedy-versus-non-greedy – Jonas May 31 '18 at 06:42
-
Thank you . It only selects the individual paths now. – Sriram May 31 '18 at 06:45
-
See my answer, @Sriram, if you want to find all paths use re.findall() – Jonas May 31 '18 at 06:48
3 Answers
8
Use regex(\/.*?\.[\w:]+)
to make regex non-greedy. If you want to find multiple matches in the same line, you can use re.findall().
Update: Using this code and the example provided, I get:
import re
re.findall(r'(\/.*?\.[\w:]+)', "file path /log/file.txt some lines /log/var/file2.txt")
['/log/file.txt', '/log/var/file2.txt']

Jonas
- 1,473
- 2
- 13
- 28
-
Great! Please accept the answer so that the question is closed if your problem is solved. @Sriram – Jonas May 31 '18 at 06:51
-
Files don't have extensions all the time, to catch files without extension you can use r'(\/[^\s\n]+)+' – Gal Shahar Jan 20 '21 at 08:38
5
Your regex (\/.*\.[\w:]+)
uses .*
which is greedy and would match [\w:]+
after the last dot in file2.txt
. You could use .*?
instead.
But it would also match /log////var////.txt
As an alternative you might use a repeating non greedy pattern that would match the directory structure (?:/[^/]+)+?
followed by a part that matches the filename /\w+\.\w+
import re
s = "file path /log/file.txt some lines /log/var/file2.txt or /log////var////.txt"
print(re.findall(r'(?:/[^/]+)+?/\w+\.\w+', s))
That would result in:
['/log/file.txt', '/log/var/file2.txt']

The fourth bird
- 154,723
- 16
- 55
- 70
3
You can use python re
something like this:
import re
msg="file path /log/file.txt some lines /log/var/file2.txt"
matches = re.findall("(/[a-zA-Z\./]*[\s]?)", msg)
print(matches)
Ref: https://docs.python.org/2/library/re.html#finding-all-adverbs

Daniel Puiu
- 962
- 6
- 21
- 29

Chavali Kalyan
- 31
- 2