5

We have a string of length N and the number X.

How to find the most frequent substring of length X in the string of length N in average O(N) time?

I think, here is a similar question: https://stackoverflow.com/questions/1597025?tab=votes#tab-top

I would like to ask you how to prove that the number of used hashing functions is only a constant.

Community
  • 1
  • 1
Thomas
  • 53
  • 1
  • 3
  • 1
    The linked question is marked `homework` - I can only assume this is too. – Oded Dec 19 '10 at 18:47
  • Does this answer your question? [Most common substring of length X](https://stackoverflow.com/questions/1597025/most-common-substring-of-length-x) – qwr Feb 18 '20 at 23:02

2 Answers2

3

A suffix tree should give this in O(n) time worst case, with O(n) space usage.

In particular check the Functionality section of the above wiki page under the Properties of Strings sub section, which mentions

Find the most frequently occurring substrings of a minimum length in Θ(n) time.

0

I suggest this kind of hash function. Lets condider that each string is number in 256 base notation (instead of our 10 based). So for each X length substring we can get its value in 10 base notation this way:

#include <iostream>
#include <string>
#include <map>
#include <algorithm>


int main()
{
    std::string s;
    int x;
    std::cin >> s >> x;

    unsigned const int base = 256;
    unsigned long long xPowOfBase = 1;
    int i = 0;
    for(i = 1; i <= x; ++i)
        xPowOfBase *= base;

    unsigned long long firstXLengthSubString = 0;
    for(i = 0; i < x; ++i)
    {
        firstXLengthSubString *= base;
        firstXLengthSubString += s[i];
    }

    unsigned long long nextXLengthSubstring = firstXLengthSubString;

    std::map<unsigned long long, std::pair<int, int> > hashTable;
    for(;i <= s.size(); ++i)
    {
        if(hashTable.find(nextXLengthSubstring) != hashTable.end())
            ++hashTable[nextXLengthSubstring].first;
        else
            hashTable.insert(std::make_pair(nextXLengthSubstring, std::make_pair(1, i - x)));

        if(i != s.size())
        {
            nextXLengthSubstring *= base;
            nextXLengthSubstring += s[i];
            nextXLengthSubstring -= s[i - x] * xPowOfBase;
        }
    }

    std::map<unsigned long long, std::pair<int, int> >::iterator it = hashTable.begin();
    std::map<unsigned long long, std::pair<int, int> >::iterator end_it = hashTable.end();
    std::pair<int, int> maxCountAndFirstPosition = std::make_pair(0, -1);

    for(;it != end_it; ++it)
    {
        if(maxCountAndFirstPosition.first < it->second.first)
            maxCountAndFirstPosition = it->second;
    }

    std::cout << maxCountAndFirstPosition.first << std::endl;
    std::cout << s.substr(maxCountAndFirstPosition.second, x) << std::endl;
    return 0;
}

This will work on O(n * log(n)) , to make it O(n) just change std::map wiht any hash table.

Mihran Hovsepyan
  • 10,810
  • 14
  • 61
  • 111