4

Consider the following set of file listings

/wr_vjxeacn/lzx/vjx/rkkelkwrvkjl.o
/wr_vjxeacn/lzx/vjx/wllnxncvr.o
/wr_hvlx/lzx/hvlx/wlxkjjlnr/Sbisln.xww
/wr_hvlx/lzx/wllqepse/lzx/xww/ANTLR/evi
/wr_hvlx/lzx/wllqepse/lzx/xww/ANTLR/zajrvhn/sjrez3x.cee
/wr_hvlx/lzx/wllqepse/lzx/xww/ivj/GNUhstnmven
/wr_hvlx/eklr+mkajc/sjrez3x64.evi.7ss153m930724031i252iic841n68i6i
/wr_hvlx/lzx/wllqepse/lzx/xww/ANTLR/evi/sjrez3x.evi
/wnkwenrkkel/lzx
/wnkwenrkkel/lzx/GNUhstnmven.xkhhkj
/wnkwenrkkel/lzx/GNUhstnmven.cnwl
/wnkwenrkkel/lzx/GNUhstnmven.evlr
/wnkwenrkkel/lzx/GNUhstnmven.gvjckgl-vs32

What I was trying to figure out is an optimal way of grouping the items with common prefix dirname

common prefix dirname = os.path.dirname(os.path.commonprefix(...))

So Ideally after the grouping the above should look like

/wr_vjxeacn/lzx/vjx
/wr_vjxeacn/lzx/vjx/rkkelkwrvkjl.o
/wr_vjxeacn/lzx/vjx/wllnxncvr.o
*************************************************************
/wr_hvlx/lzx/hvlx
/wr_hvlx/lzx/hvlx/wlxkjjlnr/Sbisln.xww
*************************************************************
/wr_hvlx/lzx/wllqepse
/wr_hvlx/lzx/wllqepse/lzx/xww/ANTLR/evi
/wr_hvlx/lzx/wllqepse/lzx/xww/ANTLR/zajrvhn/sjrez3x.cee
/wr_hvlx/lzx/wllqepse/lzx/xww/ivj/GNUhstnmven
*************************************************************
/wr_hvlx/eklr+mkajc
/wr_hvlx/eklr+mkajc/sjrez3x64.evi.7ss153m930724031i252iic841n68i6i
*************************************************************
/wr_hvlx/lzx/wllqepse/lzx/xww/ANTLR/evi
/wr_hvlx/lzx/wllqepse/lzx/xww/ANTLR/evi/sjrez3x.evi
*************************************************************
/wnkwenrkkel
/wnkwenrkkel/lzx
*************************************************************
/wnkwenrkkel/lzx
/wnkwenrkkel/lzx/GNUhstnmven.xkhhkj
/wnkwenrkkel/lzx/GNUhstnmven.cnwl
/wnkwenrkkel/lzx/GNUhstnmven.evlr
/wnkwenrkkel/lzx/GNUhstnmven.gvjckgl-vs32

What I have tried

  1. itertools.groupby, but it does not have lookahead or lookbehind.
  2. iterating and breaking on prefix change. Cannot find a working solution to consider all edge cases

Motivation

I have a list of checked in files which I want to group based on modules to identify respective module owners.

Abhijit
  • 62,056
  • 18
  • 131
  • 204
  • Maybe you can treat this as a longest common substring problem. [Look here](http://stackoverflow.com/a/2894073/1125413) for example. – pemistahl Feb 14 '13 at 21:39
  • @PeterStahl: No, that will over complicate it. LCS is a NP Hard problem. The solution should not be O(N) complexity IMHO – Abhijit Feb 15 '13 at 03:53
  • Since you're interested in common prefixes, would a [Trie](http://stackoverflow.com/questions/960963/trie-prefix-tree-in-python) be a solution that deals tidily with all the edge cases? – Simon Feb 15 '13 at 04:52

0 Answers0