1

I have a string which contains multiple file paths, some of which contain arbitrary newlines within the path, and I want to parse the string using python so that only the filenames and extensions remain.

For example:

a/b/c/d/file1.c  
a/b/c/d/e/f/g/h/1/2/3/4/5/foo.c  
dir1/dir2/newlinedir  
/nextlinedir/bar.c

should be parsed to give output:

file1.c
foo.c
bar.c

I am using the following regular expression (the groups for the filename and extension must be separate for later purposes):

path_regex = re.compile(r'.*\/([^\/\.]*)(\.c){0,1}$', re.MULTILINE)
path_regex.sub(r'\g<1>\g<2>', input_string)

This will work on strings with single line paths but not paths that contain newlines. What should I do?

Avinash Raj
  • 172,303
  • 28
  • 230
  • 274
foobar1
  • 15
  • 5

5 Answers5

1

Try this regex: (?:.*\/)(.+)\.(.+)

Use \1 to access filename and \2 to access extension

DEMO

Anshul Rai
  • 772
  • 7
  • 21
  • Thank you for the answer, but I need to have dir1/dir2/newlinedir removed too, and move bar.c to the previous line. Only the filename and extension must remain, nothing else. – foobar1 Jun 30 '15 at 05:52
1

You may try this,

>>> s = '''a/b/c/d/file1.c  
a/b/c/d/e/f/g/h/1/2/3/4/5/foo.c  
dir1/dir2/newlinedir  
/nextlinedir/bar.c'''
>>> print(re.sub(r'(?s).*?([^/]+\.c)', r'\1\n', s))
file1.c
foo.c
bar.c

or

>>> print(re.sub(r'(?s).*?([^/]+)(\.[^.\n]+)(?=$|\n)', r'\1\2\n', s))
file1.c  
foo.c  
bar.c
Avinash Raj
  • 172,303
  • 28
  • 230
  • 274
1
^([\s\S]*?\/)(\w+\.c)

Try this.See demo.This will work multiline too.Use m or multiline flag.

https://regex101.com/r/rX1tE6/7

vks
  • 67,027
  • 10
  • 91
  • 124
0

This simple regex also works and you can access the filename with extension using \1

([^/]*\.\w+)
Martin Brandl
  • 56,134
  • 13
  • 133
  • 172
0

This is technically not what you are asking for, but maybe regex here is not the right tool, since now you have two problems.

I think this is what you are searching for:

pydoc os.path.basename

So try with this:

map(os.path.basename, text.split('\n'))
Dacav
  • 13,590
  • 11
  • 60
  • 87