1

My task is to create a program in C ++ that processes a text file in sequential mode. The data must be read from the file one line at a time. Do not back up the entire contents of the file to RAM. The text file contains syntactically correct C++ code and I have to count how many assignment operators are there.

The only thing I could think of was making a function that searches for patterns and then counts how many times they appear. I insert every assignment operator as a pattern and then sum all the counts together. But this does not work because if I insert the pattern "=" many operators such as "%=" or "+=" also get counted in. And even operators like "!=" or "==" get counted, but they shouldn't because they are comparison operators.

My code gives the answer 7 but the real answer should be 5.

#include <iostream>
#include <fstream>

using namespace std;

int patternCounting(string pattern, string text){
    int x = pattern.size();
    int y = text.size();
    int rez = 0;

    for(int i=0; i<=y-x; i++){
        int j;
        for(j=0; j<x; j++)
            if(text[i+j] !=pattern[j]) break;

        if(j==x) rez++;
    }
    return rez;
}

int main()
{
    fstream file ("test.txt", ios::in);
    string rinda;
    int skaits=0;
    if(!file){cout<<"Nav faila!"<<endl; return 47;}

    while(file.good()){
        getline(file, rinda);
        skaits+=patternCounting("=",rinda);
        skaits+=patternCounting("+=",rinda);
        skaits+=patternCounting("*=",rinda);
        skaits+=patternCounting("-=",rinda);
        skaits+=patternCounting("/=",rinda);
        skaits+=patternCounting("%=",rinda);
    }

    cout<<skaits<<endl;

    return 0;
}

Contents of the text file:

#include <iostream>

using namespace std;

int main()
{
    int z=3;
    int x=4;

    for(int i=3; i<3; i++){
        int f+=x;
        float g%=3;

    }

}

Note that as a torture test, the following code has 0 assignments on older C++ standards and one on newer ones, due to the abolition of trigraphs.

// = Torture test
int a = 0; int b = 1;

int main()
{
    // The next line is part of this comment until C++17 ??/
    a = b;
    struct S
    {
        virtual void foo() = 0;
        void foo(int, int x = 1);
        S& operator=(const S&) = delete;
        int m = '==';
        char c = '=';
    };
    const char* s = [=]{return "=";}();
    sizeof(a = b);
    decltype(a = b) c(a);
}
Bathsheba
  • 231,907
  • 34
  • 361
  • 483
JPicmanis
  • 11
  • 1
  • 4
    `int z=3;` is actually not really an assignment, but an initialization. Do you want to count those as well? If yes, do you also want to count the other forms of initialization without the `=` character? – Jakob Stark Jun 08 '22 at 06:55
  • I would argue that all of them are assignments, anyway this seems easily fixable with a good regex, something like `[^%]=` will stop matching `%=`. – Quimby Jun 08 '22 at 06:55
  • 3
    Also the C++ code is **not** syntactically correct. `int f+=x;` is not valid C++ code. – Jakob Stark Jun 08 '22 at 07:12
  • 3
    This is hard. So many edge cases. Off the bat, you've forgotten the trigraph `??=` (although removed in later standards). There's also `==` and multicharacter literals containing `=` as well as the plain old `char`, and `=` appearing in strings. Then there are comments to worry about too. Does the pure virtual declaration `virtual pure() = 0;` count as an assignment too!? – Bathsheba Jun 08 '22 at 07:12
  • 1
    You could use clang to print the C++ syntax tree, like described in the answers to [this](https://stackoverflow.com/q/18560019/17862371) question. – Jakob Stark Jun 08 '22 at 07:21
  • 1
    I've been a bit naughty and added another test case to the question, if anything to emphasise how difficult this problem is – Bathsheba Jun 08 '22 at 07:23
  • OK, I failed to specify that I need to focus on counting these 6 (=, +=, *=, -=, /=, %=) assignment operators. Including initializations but excluding cases where the operators might show up in comments or as a part of a string. – JPicmanis Jun 08 '22 at 07:36
  • @JPicmanis: Even then, there are default arguments in functions, lambda capture, operator overload declarations, default types in template declarations. – Bathsheba Jun 08 '22 at 07:41
  • Your reading loop is wrong. See [here](https://stackoverflow.com/questions/5605125/why-is-iostreameof-inside-a-loop-condition-i-e-while-stream-eof-cons) for details. – molbdnilo Jun 08 '22 at 07:46
  • You're counting the `'='` in compound operators twice. – molbdnilo Jun 08 '22 at 07:47
  • There are some things not considered in your code: Commented out assignments `// foo = 2;`, "assignments" in stringliterals `f("foo=3;")`, type aliases with `using` (`using T = std::string;`), namespace aliases (`namespace s = std;`), default function parameters, pure virtual functions, defaulted/deleted functions, ect. Furthermore assignments may not actually get evaluated; it's unclear if `using T = decltype(foo = 1);` should count as any assignment. Furthermore you're ignoring the preprocessor which could deactivate sections of code or let you write an assignment as `foo ASSIGN 2;` – fabian Jun 08 '22 at 07:54

3 Answers3

2

There are multiple issues with the code.

The first, rather mundane issue, is your handling of file reading. A loop such as while (file.good()) … is virtually always an error: you need to test the return value of getline instead!

std::string line;
while (getline(file, line)) {
    // Process `line` here.
}

Next, your patternCounting function fundamentally won’t work since it doesn’t account for comments and strings (nor any of C++’s other peculiarities, but these seem to be out of scope for your assignment). It also doesn’t really make sense to count different assignment operators separately.

The third issue is that your test case misses lots of edge cases (and is invalid C++). Here’s a better test case that (I think) exercises all interesting edge cases from your assignment:

int main()
{
    int z=3; // 1
    int x=4; // 2

    // comment with = in it
    "string with = in it";
    float f = 3; // 3
    f = f /= 4; // 4, 5

    for (int i=3; i != 3; i++) { // 6
        int f=x += z; // 7, 8
        bool g=3 == 4; // 9
    }
}

I’ve annotated each line with a comment indicating up to how many occurrences we should have counted by now.

Now that we have a test case, we can start implementing the actual counting logic. Note that, for readability, function names generally follow the pattern “verb subject”. So instead of patternCounting a better function name would be countPattern. But we won’t count arbitrary patterns, we will count assignments. So I’ll use countAssignments (or, using my preferred C++ naming convention: count_assignments).

Now, what does this function need to do?

  1. It needs to count assignments (incl. initialisations), duh.
  2. It needs to discount occurrences of = that are not assignments:
    1. inside strings
    2. inside comments
    3. inside comparison operators

Without a dedicated C++ parser, that’s a rather tall order! You will need to implement a rudimentary lexical analyser (short: lexer) for C++.

First off, you will need to represent each of the situations we care about with its own state:

enum class state {
    start,
    comment,
    string,
    comparison
};

With this in hand, we can start writing the outline of the count_assignments function:

int count_assignments(std::string const& str) {
    auto count = 0;
    auto state = state::start;
    auto prev_char = '\0';

    for (auto c : str) {
        switch (state) {
            case state::start:
                break;
            case state::comment:
                break;
            case state::string:
                break;
            case state::comparison:
                break;
        }
        prev_char = c;
    }

    // Useful for debugging:
    // std::cerr << count << "\t" << str << "\n";

    return count;
}

As you can see, we iterate over the characters of the string (for (c : str)). Next, we handle each state we could be currently in.

The prev_char is necessary because some of our lexical tokens are more than one character in length (e.g. comments start by //, but /= is an assignment that we want to count!). This is a bit of a hack — for a real lexer I would split such cases into distinct states.

So much for the function skeleton. Now we need to implement the actual logic — i.e. we need to decide what to do depending on the current (and previous) character and the current state.

To get you started, here’s the case state::start:

switch (c) {
    case '=':
        ++count;
        state = state::comparison;
        break;
    case '<': case '>': case '!':
        state = state::comparison;
        break;
    case '"' :
        state = state::string;
        break;
    case '/' :
        if (prev_char == '/') {
            state = state::comment;
        }
        break;
}

Be very careful: the above will over-count the comparison ==, so we will need to adjust that count once we’re inside case state::comparison and see that the current and previous character are both =.

I’ll let you take a stab at the rest of the implementation.

Note that, unlike your initial attempt, this implementation doesn’t distinguish the separate assignment operators (=, +=, etc.) because there’s no need to do so: they’re all counted automatically.

Konrad Rudolph
  • 530,221
  • 131
  • 937
  • 1,214
  • One important caveat of the above approach is that it ignores multi-line comments, since your problem description requires you to handle each line separately. We *could* add logic for multi-line comments, but it would require us to add another parameter and another return value to `count_assignments`. At this point, I would probably choose a fundamentally different approach, rather than line-based counting. – Konrad Rudolph Jun 08 '22 at 08:31
  • Also, as noted in the comments under the question, this doesn’t actually use C++ code as its input, not even close to! It is fundamentally impossible to count assignment operations in C++ code without a full-blown parser, as the torture test posted by Bathsheba shows. In a real-world scenario, you’d use libclang to parse the C++ code. – Konrad Rudolph Jun 08 '22 at 08:40
0

The clang compiler has a feature to dump the syntax tree (also called AST). If you have syntactically correct C++ code (which you don't have), you can count the number of assignment operators for example with the following command line (on a unixoid OS):

clang++ -Xclang -ast-dump -c my_cpp_file.cpp | egrep "BinaryOperator.*'='" | wc -l

Note however that this will only match real assigments, not copy initializations, which also can use the = character, but are something syntactically different (for example an overloaded = operator is not called in that case).

If you want to count the compound assignments and/or the copy initializations as well, you can try to look for the corresponding lines in the output AST and add them to the egrep search pattern.

Jakob Stark
  • 3,346
  • 6
  • 22
0

In practice, your task is incredibly difficult.

Think for example of C++ raw string literals (you could have one spanning dozen of source lines, with arbitrary = inside them). Or of asm statements doing some addition....

Think also of increment operators like (for some declared int x;) a x++ (which is equivalent to x = x+1; for a simple variable, and semantically is an assignment operator - but not syntactically).

My suggestion: choose one open source C++ compiler. I happen to know GCC internals.

With GCC, you can write your own GCC plugin which would count the number of Gimple assignments.

Think also of Quine programs coded in C++...

NB: budget months of work.

Basile Starynkevitch
  • 223,805
  • 18
  • 296
  • 547