7

It's a pretty common algorithm in command line parsing. Given a set of predefined long option names -- compute the shortest prefix that uniquely identifies one of these options. So for instance, for the following options:

-help
-hostname
-portnumber
-name
-polymorphic

This would be the output:

-he
-ho
-por
-n
-pol

I'm thinking two possible ways of doing this -- either as a tree:

               *
             / | \
            /  |  \
           H   N   P
          / \      |
         E   O     O
                  / \
                 R   L

Or by searching for substrings:

for (String s : strings) {
   for (int i = 1; i < s.length(); s++) {
      if (search(strings,s.substring(0,i)) == 1) {
          result.add(s.substring(0,i);
          break;
      }
   }
}

So, question is:

  1. Which would you go for?
  2. Am I missing an obvious third way?
Asgeir S. Nilsen
  • 1,137
  • 9
  • 13
  • Context, context, context! I would go for the one that was better in my scenario. – Mark Peters Aug 31 '10 at 19:14
  • Option 1 seems like the best way to go. Fast, accurate, and straight forward... – Kendrick Aug 31 '10 at 19:14
  • Context is command-line parsing -- thus it would be built once and used once. Since this is garbage collected and most operating systems limit command lines well under 1 MB memory usage is not an issue. Performance must be balanced between building the structure and the subsequent lookup, since both will, in general, be performed only once. – Asgeir S. Nilsen Aug 31 '10 at 19:21
  • What are you optimizing for? Speed, memory usage, maintainability? – Alex Feinman Aug 31 '10 at 19:52
  • Speed and maintainability I'd say. Command line argument definitions are in general parsed only once for the duration of the application, and the actual arguments are very seldom parsed more than once. – Asgeir S. Nilsen Sep 02 '10 at 07:23

3 Answers3

5

The "tree" solution is a special case (well, actually it's pretty general) of a Patricia trie.

The first will generally be faster to lookup. The memory considerations probably aren't relevant to your context, since it's not permanently used and you're only performing the "lookup" once.

Mark Peters
  • 80,126
  • 17
  • 159
  • 190
0

I'd do the tree, looks fine.

You could build up a hash of every possible distinct substring.

Hashmap<String, String> validSubs = new Hashmap<String, String>();
HashSet<String> usedSubs = new HashSet<String>();

for (String option : options) {
  for(int i = 0; i <= option.length; i++) {
    String sub = option.substring(0, i);
    if(usedSubs.contains(sub)) {
      validSubs.remove(sub);
    } else {
      validSubs.add(sub, option);
      usedSubs.add(sub);
    }
  }
}
ykaganovich
  • 14,736
  • 8
  • 59
  • 96
0

Oh, yeah, the most obvious missing answer is to use a library that already does this. How to parse command line arguments in Java?

Community
  • 1
  • 1
ykaganovich
  • 14,736
  • 8
  • 59
  • 96