Given a list of strings L (sorted), and a positive integer N (N <= len(L)), how to efficiently partition L into no more than N groups by common prefix of length N?
Example: define data structure and function as below:
type PrefixGroup struct {
Prefix string
Count int
}
func partition(L []string, N int, prefix string) []PrefixGroup
The list L may contains thousands of strings, when called with
partition(L, 8, "")
the output may be:
[
{"Prefix":"13", "Count":1000},
{"Prefix":"180": "Count": 10},
{"Prefix":"X": "Count": 2},
... ...
]
which means in L, there are 1000 strings start with "13", 10 start with "180" and 2 start with "X". Note that the length of prefix is not fixed. The key requirement of this algorithm is to partition strings with common prefix so that the number of groups is as close as but not exceeding N.
With the result above, I can then call partition(L, 8, "13")
to further drill down subset of L that start with "13":
[
{"Prefix":"131", "Count": 50},
{"Prefix":"135": "Count": 100},
{"Prefix":"136": "Count": 500},
... ...
]
This is not a homework question. I need to write such an algorithm for a project at hand. I can write it "brute-force"-ly, just wonder if there are any classic/well-known data structure and/or algorithm to achieve proven time/space efficiency.
I have considered trie
, but wonder if it may consume too much memory...