1

Here is my requirement:

  • Input: Random String of sufficiently long ex: fdjhkajajkfdj
  • Output: fdj has a 2 occurences and separated by x chars

I want to put all three letter words in an array and check if they are the same Eg:

a[0] = fdj
a[1] = djh
a[2] = jhk
a[3] = hka
a[4] = kaj
.
.
.
a[n] =fdj

My answer is a[0] and a[n] matches, may be more than 2 occurances.

Question: So what kind of array should I use which is optimal in this situation. I am using Java (and also python). I was thinking of Dict.

tshepang
  • 12,111
  • 21
  • 91
  • 136
rda3mon
  • 1,709
  • 4
  • 25
  • 37
  • I think your approach might be less efficient than simply traversing the string and doing a find. Is there a reason you need to store the three letter words in this way? – JoshD Sep 28 '10 at 02:42
  • 1
    would aaaa return a match at [0] and [1]? The solution changes if there are no overlaps. What's the exact output for 'aaa' found at [3], [20], and [33]? – Tony Ennis Sep 28 '10 at 03:01
  • @JoshD: I don't have to store any letters but only duplicates should be found. @Tony: I want the distance between 2 sequences. So output expected is aaa found at 3, 20 and 33 is fine. – rda3mon Sep 28 '10 at 03:55
  • Possible duplicate of [Java Equivalent to Python Dictionaries](http://stackoverflow.com/questions/1540673/java-equivalent-to-python-dictionaries) – Raedwald Mar 17 '16 at 19:48

5 Answers5

1

In Java you could use the Map interface ( http://download.oracle.com/javase/1.4.2/docs/api/java/util/Map.html )

I would use HashMap so that the key is the 3 letter word and the value is the count of occurances. Here's some sample pseudo code

HashMap<String, int> wordCountMap = new HashMap<String, int>();
for(....) // for each 3 letter word in the input
{
    String word = ...; // current three letter word
    if(wordCountMap.containsKey(word))
        wordCountMap.put(word, wordCountMap.get(word)++);
    else
        wordCountMap.put(word, 1);
}

Then you can loop through the key/value pairs and return their occurance count.

To return the number of characters between the words, you can do this separately after counting the occurances by using String manipulation (see String.indexOf). Pseudo code for this is....

String orginalInput = "fdjhkajajkfdj";
String word = "fdj";
int firstOccurance = originalInput.indexOf();
int secondOccurance = originalInput.indexOf(firstOccurance+1);
int charsInBetween = secondOccurance - firstOccurance - 3; // difference in indices minus length of word
Jacob
  • 1,242
  • 2
  • 9
  • 14
0

In Python a dict is fine.

In Java, you could use a HashSet if you need to detect only the first match, but if you want to count the number of matches, you could use a Map

Edit: you changed the parameters of the question, so here's what I suggest now. Use a Map> - the key is the 3 letter word, and you're maintaining a list of index values that the string occurs. You can use an equivalent in Python

Anon
  • 1,290
  • 2
  • 16
  • 24
0

you could sort them and look for duplicates or put them into a linked hash set and check for a duplicate before you insert something.

Ray Tayek
  • 9,841
  • 8
  • 50
  • 90
0

Well. fdj will be matched because it is the first 3 characters of the string? Or does it come from somewhere else? If you have more then 2 occurences of your needle, do you need the distance between the first 2 matches, or the first and the last, or all the distances for each couple of matches?

Well, I can give you a function that gives you all the matches.

>>> def find_matches(needle, hackstay):
...   '''returns a list of positions of needle in hackstay'''
...   ptr = 0
...   found = []
...   while True:
...     idx = hackstay[ptr:].find(needle)
...     if idx < 0: return found
...     found.append(ptr+idx)
...     ptr += idx+len(needle)
... 
>>> 
>>> 
>>> find_matches('fdj','fdjhkajajkfdj')
[0, 10]

Distance between 2 elements of the array is just the bigger element minus the smaller element minus the length of needle.

Example:

>>> res = find_matches('fdj','fdjhkajajkfdj')
>>> distance = abs(res[0]-res[1])-len('fdj')
>>> print distance
7

With this you can decide by yourself, where needle comes from and what distances you need. Hope it helps!

PS: If anybody can suggest how to improve that code, please do! My feeling says that this can be written shorter (like using found = [i for ??? if ???]), but I don't know, how.

erikbstack
  • 12,878
  • 21
  • 81
  • 115
  • In that case, I need to call the function find_matches() for every a[i] i=0-n. What if I have 1000 characters? Which I feel is not very efficient. – rda3mon Sep 28 '10 at 04:23
  • No, you don't need `a` at all. In the solution I posted, string.find() does the job of finding the next match for you. And believe me, it works as efficient or better then every solution you or I could come up with. – erikbstack Sep 28 '10 at 04:49
  • Sure I did understand now, let me try it out. Also I implemented using Map in Java. Thanks. – rda3mon Sep 28 '10 at 06:48
0

Your way of storing three letter words in an Array is NOT Efficient. Please consider storing the String in a Suffix Tree or simply in an Array and use the KMP Algorithm to find the max occurence of the string you have to search. Later the counts can be stored however you choose.

Geek
  • 23,089
  • 20
  • 71
  • 85