1

What is the most effective way to find a list of longest common parent path strings in a list of path strings using python?

Additional Note Where there are two or more matches I would like to descend as necessary to create as few as possible redundant

Input list

input_paths = [
  '/project/path/to/a/directory/of/files',
  '/project/path/to/a/directory/full/of/files',
  '/project/path/to/some/more/files',
  '/project/path/to/some/more/directories/of/files'
  '/project/path/to/another/file',
  '/project/mount/another/path/of/files',
  '/project/mount/another/path/of/test/stuff',
  '/project/mount/another/path/of/files/etc',
  '/project/mount/another/drive/of/things',
  '/project/local/folder/of/documents'
]
filter_path = '/project'

Output list

common_prefix_list = [
  'path/to/a/directory',
  'path/to/some/more',
  'path/to/another',
  'mount/another/path/of',
  'mount/another/drive/of',
  'local/folder/of'
]

My rudimentary guess is to split into lists on os.sep and then use set intersection but I believe there are more robust algorithms to find what is essentially a longest common substring problem. I'm sure this has been done a million times before so please offer up your elegant solution.

My end task is to collect a list of assets common to a project in disparate paths into one common folder with a structure that does not create conflicts with individual assets nor create paths that are overly redundant (hence the filter_path).

rjmoggach
  • 1,458
  • 15
  • 27
  • Why these three? Why not `/path/to` for all five inputs? – Tim Pietzcker Aug 27 '12 at 16:46
  • A list of all common parents is the same as the list of paths (or their parents). Your problem would make more sense if it were just one result - `/path/to`. – Ry- Aug 27 '12 at 16:47
  • Continuing @TimPietzcker and @minitech: Consider the trade-off between "*longest* paths" and *shortest* output list (which will yield a list with one element: `/path/to`). These are two different problems. Also, your end task description is not clear to me. – Lior Aug 27 '12 at 16:50
  • See [longest common substring from more than two strings](http://stackoverflow.com/questions/2892931/longest-common-substring-from-more-than-two-strings-python) – Burhan Khalid Aug 27 '12 at 16:53
  • I've seen that post and it's similar but I would still need to pre or post process the list and not sure if there isn't a better way – rjmoggach Aug 27 '12 at 17:04
  • @Lior: to be perfectly accurate I should have asked for cases where there are two or more common path prefixes – rjmoggach Aug 27 '12 at 17:08
  • `os.path` provides a `commonprefix` function which might be useful for a portion of this problem. As others have stated, This algorithm needs to be hashed out a little more before we can come up with an implemenatation however. – mgilson Aug 27 '12 at 17:10
  • @mgilson: I agree - upon further thought there's an element of user input required to determine if the destination path created as a result of culling the common prefix is actually user-friendly - I suppose my goal is to make that input as minimal as possible – rjmoggach Aug 27 '12 at 17:21
  • @mgilson: funny thing about commonprefix is this little documentation note: "Note that this may return invalid paths because it works a character at a time." so it's no different than using the string solution suggested [above](http://stackoverflow.com/questions/2892931/longest-common-substring-from-more-than-two-strings-python) – rjmoggach Aug 27 '12 at 17:23
  • @mogga -- Yep, but it's in the standard library, so you don't need to roll your own version :) – mgilson Aug 27 '12 at 17:28
  • 1
    Should `'/mount/another/drive/of/'` be one of the returned results, or is the inclusion of `'/path/to/another'` incorrect? – Matthew Trevor Aug 28 '12 at 06:00
  • @MatthewTrevor just said what I spotted as well: the output is wrong. That's because the problem is ill-defined. How many folder should be retained, in general? There is one answer only that is clear, namely to find *the* common ancestor (which, in your case, is "/"). Please update your question, otherwise it doesn't constitute a valid question. – Mayou36 May 16 '22 at 13:39
  • @MatthewTrevor fixed the output... 10 years later... – rjmoggach May 26 '22 at 17:12

0 Answers0