3

I have a string :

www.domain.com/I-Need-This-Part

I need to detect what the most frequently used delimiter is after the / which in this case is - . The delimiter may change depending on the url.

Once I working this out, I will use .split and pass in the most frequently used delimiter to count the parts.

Any help on this would be much appreciated.

Thank you in advance.

  • 5
    Do you have a specific set of delimiters? Or can any character, alphabet/numerics included, be a delimiter? Is a delimiter necessarily a single character? – nicholas.hauschild Jul 09 '12 at 20:19
  • Erm, what beyond "count all occurrences of all delimiters after /" do you expect us to say? – Jochen Jul 09 '12 at 20:21

4 Answers4

2

Define the delimiters, then count them and order them.
Defining them is up to you.
For counting here is a link: Java: How do I count the number of occurrences of a char in a String?
And you can do an on the fly ordering by using a TreeMap for instance with an appropriate comparator and content type, e.g. A Delimiter - nrOfOccurrences pair. (A PriorityQueue would also do the trick)

Community
  • 1
  • 1
zeller
  • 4,904
  • 2
  • 22
  • 40
1

First thing which comes to mind:

  • Iterate over the part of the string in which you want to count delimiters
  • Check if the current char is a delimiter (preferably in O(1))
  • Have a hashmap from char to int, insert the current delimiter if not inserted, else ++ that entry
  • Iterate over the hashmap to find the delimiter which was used the most

If you have only few delimiters, you should use the answer of zeller. If you have a lot of delimiters, this algorithm should be more efficient.

Misch
  • 10,350
  • 4
  • 35
  • 49
  • Why not just track the most frequent char so far? Buliding a whole hashmap and then iterating through it is double overkill. – Marko Topolnik Jul 09 '12 at 20:59
  • counterexample: `++--+`. You see the first `+`, so you save `(+, 1)`. Next `+`, you add one and get `(+, 2)`. Now you see a `-`. What do you do? Delete the `+`? So you save `(-, 1)`, then `(-, 2)`. Now comes a `+` again, which would make the `+` the most frequent char but you don't know that as you have already deleted that information – Misch Jul 09 '12 at 21:03
  • I see your point -- you'd iterate through the string only once. I have in mind a solution where you iterate through the string N times (N = number of delimiter candidates). Unless N get quite big, that's more efficient since there are no data structures involved and running over the same short string several times reuses the processor cache well. – Marko Topolnik Jul 09 '12 at 21:09
  • 1
    Since the set of characters is very limited (128, since URIs are ASCII), no hashmap is needed. You can just count in a constant length array of 128 ints. And you should ignore alphanumerical chars [A-Za-z0-9], since they will be used as delimiters very unlikely but appear in the words to be split. – leemes Aug 05 '12 at 13:33
0

It should be a very simple task to just count the chars that you consider as candidates for the delimiter. You can employ indexOf for that, for example.

final String url  = "www.domain.com/I-Need-This-Part";
final int neededStart = url.indexOf('/')+1;
char mostFrequent = '\u0000';
int highestFreq = 0;
for (char delim : new char[] {'%', '-', '$', '+'}) {
  int cnt = 0;
  for (int i = url.indexOf(delim, neededStart); i != -1;
        i = url.indexOf(delim, i+1), cnt++);
  if (cnt > highestFreq) { highestFreq = cnt; mostFrequent = delim; }
}
System.out.println(mostFrequent);
Marko Topolnik
  • 195,646
  • 29
  • 319
  • 436
0

Assuming you have a specific set of possible delimiters, you can use the Apache Commons library and use their

StringUtils.countMatches

method to calculate the number of occurances.

Another way (again if you have a list of delimiters already), you can just iterate through the String once by creating a Map (key = delimiter, value= count) and as you encounter delimiter, put it in the map, if it already exist, increment the count. From there, you can figure out which entry has the highest count.

TS-
  • 4,311
  • 8
  • 40
  • 52