What is the most effective way to find a list of longest common parent path strings in a list of path strings using python?
Additional Note Where there are two or more matches I would like to descend as necessary to create as few as possible redundant
Input list
input_paths = [
'/project/path/to/a/directory/of/files',
'/project/path/to/a/directory/full/of/files',
'/project/path/to/some/more/files',
'/project/path/to/some/more/directories/of/files'
'/project/path/to/another/file',
'/project/mount/another/path/of/files',
'/project/mount/another/path/of/test/stuff',
'/project/mount/another/path/of/files/etc',
'/project/mount/another/drive/of/things',
'/project/local/folder/of/documents'
]
filter_path = '/project'
Output list
common_prefix_list = [
'path/to/a/directory',
'path/to/some/more',
'path/to/another',
'mount/another/path/of',
'mount/another/drive/of',
'local/folder/of'
]
My rudimentary guess is to split into lists on os.sep and then use set intersection but I believe there are more robust algorithms to find what is essentially a longest common substring problem. I'm sure this has been done a million times before so please offer up your elegant solution.
My end task is to collect a list of assets common to a project in disparate paths into one common folder with a structure that does not create conflicts with individual assets nor create paths that are overly redundant (hence the filter_path
).