Most frequent Alphabet Substring

Question

My Logic is working for smaller input How can I make it better to accepts for larger inputs.

Q)
The program must accept a string S containing only lower case alphabets and Q queries as the input. Each query contains two integers representing the starting and the ending indices of a substring in S. For each query, the program must print the most frequently occurring alphabet in the specified substring. If two or more alphabets have the same frequency, then the program must print the alphabet that is least in the alphabetical order.

Boundary Condition(s):
2 <= Length of S <= 1000
1 <= Q <= 10^5

Example :
Input: badbadbed
4
0 8
1 4
0 5
2 7
Output:
b
a
a
b

#include <bits/stdc++.h>

using namespace std;

int main(int argc, char** argv)
{
  string s;
  cin >> s;
  int k;
  cin >> k;
  int x, y;
  while (k--)
  {
    cin >> x >> y;
    map<char, int>counter;
    string ss = s.substr(x, y - x + 1);
    set <char>sample;
    for (auto i : ss)
      sample.insert(i);
    int maxx = -1, st; char anss;
    for (auto i : sample)
      if ((st = count(ss.begin(), ss.end(), i)) > maxx)
      {
        maxx = st;
        anss = i;
      }
    cout << anss << endl;
  }
}

[Why should I not `#include `?](https://stackoverflow.com/questions/31816095/why-should-i-not-include-bits-stdc-h) [Why is `using namespace std;` considered bad practice?](https://stackoverflow.com/questions/1452721/why-is-using-namespace-std-considered-bad-practice) — Evg, Nov 20 '21 at 05:14
if you can waste memory, you can use a 26*len(S) matrix. then pass through string once and update each line of matrix. then you can answer queries by O(1) by subtracting matrix values from start to end. Have not tried it yet, but I think it will work. I try to write code myself. — Afshin, Nov 20 '21 at 06:09

Afshin · Accepted Answer · 2021-11-20T06:40:52.310

Following code has a lot of memory waste, but each query is looked up in O(1):

int main() {
    std::string s;
    int q = 0, start = 0, end = 0;
    std::cin >> s;
    std::cin >> q;

    auto mem = std::make_unique<int[]>(26 * s.length());
    for (int i = 0; i < s.length(); i++) {
        mem[i * 26 + s[i] - 'a']++;
        if (i < (s.length()-1)) {
            for (int l = 0; l < 26; l++) {
                mem[(i+1) * 26 + l] = mem[i * 26 + l];
            }
        }
    }

    for (int i = 0; i < q; i++) {
        std::cin >> start >> end;
        int max_val = 0;
        int max_idx = -1;
        for (int l = 0; l < 26; l++) {
            auto v = (mem[end * 26 + l] - mem[start * 26 + l]);
            if (v > max_val) {
                max_val = v;
                max_idx = l;
            }
        }

        std::cout << (char)('a' + max_idx) << std::endl;
    }

    return 0;
}

No comments. Bad variable names. No explanations. Just dumped the code . . . — A M, Nov 29 '21 at 10:39

Jasmine Tang · Answer 2 · 2021-11-21T08:34:20.740

Have you ever heard of segment trees?

Usually, when encountered with problems regarding some kind of value on an interval, we usually deploy segment trees to deal with these types of problems. The tree gives n*log(n) complexity, very niceeee, I love them so muchy much :)

Please stop reading if you haven't heard of recursion/segment tree/map/ yet.

A segment tree for a set I of n intervals uses O(n log n) storage and can be built in O(n log n) time

First of all, instead of considering the most frequently occurring alphabet, let us consider the occurrence of all characters of the full string. If we have the occurrence of all characters, we can easily pick out the one that has the most occurrences.

Great, let's look at the operation you'll need:

Build a segment tree from the string
Query the interval from the segment tree

If you look at the usual segment tree implementations on the internet, they're usually about the maximum or minimum value over an interval, so they'll build their segment using std::max(...).

In our case, we care about the occurrence of all characters, so we'll build our segment using the whole alphabet (only 26 lol)

void build(int id, int l, int r, std::string str, std::vector<std::map<char, int>>& seg_tree) 
{
    if(l == r) {
        seg_tree[id][str[l]]++;
        return;
    }
    int mid = (l+r)/2;

    build(id*2, l, mid, str, seg_tree);
    build(id*2+1, mid+1, r, str, seg_tree);

    for(int i=97;i<=122;i++) {
        seg_tree[id][(char)i] = seg_tree[id*2][(char)i] + seg_tree[id*2+1][(char)i];
    }
    return;
}

The query follows the same format, we'll recursively descend into the tree until we have an interval of 1, then we'll go back up to to combine smaller queries into bigger queries.

std::map<char, int> query(int id, int l, int r, int q, int p, std::vector<std::map<char, int>>& seg_tree) {
    std::map<char,int> mp;
    if(p < l || q > r) {
        return mp;
    }

    if(l <= q && p <= r) {
        return seg_tree[id];
    }

    int mid = (l+r)/2;

    std::map<char,int> mp1 = query(id*2, l, mid, q, p, seg_tree);
    std::map<char,int> mp2 = query(id*2+1,mid+1, r, q, p, seg_tree);

    for(int i=97;i<=122;i++) {
        mp[(char)i] = mp1[(char)i] + mp2[(char)i];
    }

    return mp;
}

And then, here's the general idea on how to use the segment tree

int main() {
    std::string str;
    std::cin >> str;
    
    
    std::vector<std::map<char, int>> seg_tree(4*str.size()+1);
    
    build(1,0,str.size()-1,str,seg_tree);

    std::map<char, int> mp = query(1, 0, str.size()-1, 0,str.size()-1, seg_tree);

    std::cout << mp.size() << std::endl;


    for(int i=97;i<=122;i++) {
        std::cout << (char)i << " " << mp[(char)i] << std::endl;
    }
    return 0;
}

The segment tree is, albeit, an overkill for smaller-scale problems. The powerful thing about it is you can generalize it to solve many problems of the same format.

However, knowing the segment tree data structure helps you a lot in competitive programming or even in coding interviews.

Please give me more feedback on how to answer your question or explain my answer better :)

zdf · Answer 3 · 2021-11-20T13:57:40.887

Using a map is overkill. You can use a lookup table. It is fast and small. Demo.

#include <iostream>
#include <string>
#include <array>

using namespace std;

char find(const string& s, int left, int right)
{
  // declare an array for all 26 letters and initialize it with 0;
  // indexes are alphabet letters
  // values are letters appearances
  array<int, 26> t{};

  // fill in the array with letter count between left & right indexes
  for (int i = left; i <= right; ++i)
    t[s[i] - 'a']++; // 'a' must be index 0, so we subtract it form current letter

  // look for the maximum value; the index plus 'a' is the letter - see the table comment
  int m = -1, idx = -1;
  for (size_t i = 0; i < t.size(); ++i)
  {
    if (t[i] > m)
    {
      m = t[i]; // we have a new maximum
      idx = i; // remember where the new maximum is
    }
  }

  // the index starts from zero, so we add 'a'
  return idx + 'a';
}


int main()
{
  //
  string s;
  if (!(cin >> s))
    return -1; // bad input

  //
  int query_count;
  if (!(cin >> query_count))
    return -2;

  //
  for (int i = 0; i < query_count; ++i)
  {
    int left, right;
    if (!(cin >> left >> right))
      return -3;
    cout << find(s, left, right) << endl;
  }

  return 0;
}

what is the time complexity of my program and your program? And why I should not use map? — Zeros, Nov 21 '21 at 11:46
@Zeros I am first concerned by the correctness & clearness of the solution, not by speed. (1) The complexity of `find` is O(n). The complexity of the lookup is O(1) in all cases. The definitive answer regarding the speed is given by a speed test. (2) I prefer to use the simplest possible tools for any problem. A lookup table fits all your needs – you don’t need a map/set library; not even ``; a plain array will do. — zdf, Nov 21 '21 at 12:24

score 1 · Answer 4 · answered Nov 28 '21 at 21:39

I do not think that a std::map is a wrong approach. Better would be a std::unordered_map, becaùse it uses fast hash accesses.

The solution can be implemented straight forward. But, to be honest, I doubt that this will be the fastest solution. The usescontainers in the below solution are fast, but not that fast. I need to think more about that.

Anyway. Let's sketch the design:

Read the string from std::cin
Read number of test cases
Run all test cases in a simple while loop
For each test case, get the start and end position of the substring to evaluate
Count each character in the substring
Use a maxheap for sorting and output the result

For all this, we need 8 statements in main.

Please check:

#include <iostream>
#include <utility>
#include <unordered_map>
#include <queue>
#include <vector>
#include <type_traits>
#include <string>

// Some Alias names for later easier reading --------------------------------------------------------------------------
using Pair = std::pair<char, size_t>;
using Counter = std::unordered_map<Pair::first_type, Pair::second_type>;
using UnderlyingContainer = std::vector<Pair>;
struct LessForSecondOfPair { bool operator ()(const Pair& p1, const Pair& p2) { 
        return (p1.second == p2.second)?(p1.first > p2.first):(p1.second < p2.second); }};
using MaxHeap = std::priority_queue<Pair, UnderlyingContainer, LessForSecondOfPair>;

// Main test code ------------------------------------------------------------------------------------
int main() {

    // Get string to be evaluated 
    if (std::string toBeEvaluated; std::cin >> toBeEvaluated)

        // And the number of test cases
        if (int numberOfTestCases{}; (std::cin >> numberOfTestCases) and numberOfTestCases > 0)

            // Now, for all testcases
            while (numberOfTestCases--)

                // Get start end end position of sub string
                if (size_t start, end; std::cin >> start >> end) {

                    // Count occurences of characters
                    Counter counter{};
                    for (char c : toBeEvaluated.substr(start, end-start-1)) counter[c]++;

                    // Output result
                    std::cout << MaxHeap(counter.begin(), counter.end()).top().first << '\n';
                }
}

Most frequent Alphabet Substring

4 Answers4