13

I'm in a source code migration and the converter program did not convert concatenation of embedded strings with integers. Now I have lots of code with this kind of expressions:

f("some text" + i);

Since C/C++ will interpret this as an array subscript, f will receive "some text", or "ome text", or "me text"...

My source language converts the concatenation of an string with an int as an string concatenation. Now I need to go line by line through the source code and change, by hand, the previous expression to:

f("some text" + std::to_string(i));

The conversion program managed to convert local "String" variables to "std::string", resulting in expressions:

std::string some_str = ...;
int i = ...;

f(some_str + i);

Those were easy to fix because with such expressions the C++ compiler outputs an error.

Is there any tool to find automatically such expressions on source code?

vz0
  • 32,345
  • 7
  • 44
  • 77
  • The way you have posed the question, you need a tool that can check the *types* of expression fed to "f". Others have suggested regexps, which can at best recognize tokens that hint at the types. If the regexp solutions are not good enough, then I have a possible solution. – Ira Baxter Jul 14 '14 at 17:10
  • 1
    @IraBaxter That sounds good! Maybe parsing the AST output of clang to find the operator + node with some const char [] children would be nice. – vz0 Jul 14 '14 at 18:08
  • I wonder if something along the lines of overloading global operator+ (const char*, int) and inducing a compiler error or warning inside the body of the overload would produce the desired result ? – AlexK Jul 16 '14 at 04:04
  • Define the overload in a header file that is included everywhere, have it generate a compile time warning each time the overload is expanded and then just filter the output of the compiler ? – AlexK Jul 16 '14 at 04:14
  • @AlexK the compiler won't allow you to overload a global operator with two native types. I've just tried it. Anyway I've found a solution and posted an answer. Thanks! – vz0 Jul 16 '14 at 13:41

9 Answers9

8

Easy! Just replace all the + with -&:

find . -name '*.cpp' -print0 | xargs -0 sed -i '' 's/+/-\&/g'


When trying to compile your project you will see, between other errors, something like this:

foo.cpp:9:16: error: 'const char *' and 'int *' are not pointers to compatible types
    return f(s -& i);
             ~ ^~~~

(I'm using clang, but other compilers should issue similar errors)


So you just have to filter the compiler output to keep only those errors:

clang++ foo.cpp 2>&1 | grep -F "error: 'const char *' and 'int *' are not pointers to compatible types"

And you get:

foo.cpp:9:16: error: 'const char *' and 'int *' are not pointers to compatible types
foo.cpp:18:10: error: 'const char *' and 'int *' are not pointers to compatible types
esneider
  • 589
  • 4
  • 12
  • 1
    Good idea, but you will not be able to recover your code back, so better do it on a backup copy of the code base (if the size of the code base permits) – AlexK Jul 16 '14 at 04:17
  • True! But I think 75k loc is quite manageable. In any case, if you wanted to do it in place, you could replace instead by `-&*&`, and then replace back the `+` signs. – esneider Jul 16 '14 at 09:19
7

You can try flint, an open-source lint program for C++ developed and used at Facebook. It has blacklisted token sequences feature (checkBlacklistedSequences). You can add your token sequence to the checkBlacklistedSequences function and flint will report them.

in checkBlacklistedSequences function, I added the sequence string_literal + number

BlacklistEntry([tk!"string_literal", tk!"+", tk!"number"],
               "string_literal + number problem!\n",
                true),

then compile and test

$ cat -n test.cpp
 1  #include <iostream>
 2  #include <string>
 3  
 4  using namespace std;
 5  
 6  void f(string str)
 7  {
 8      cout << str << endl;
 9  }
10  
11  int main(int argc, char *argv[])
12  {
13      f("Hello World" + 2);
14  
15      f("Hello World" + std::to_string(2));
16  
17      f("Hello World" + 2);
18  
19      return 0;
20  }

$ ./flint test.cpp 
test.cpp(13): Warning: string_literal + number problem!
test.cpp(17): Warning: string_literal + number problem!

flint has two versions (old version developed in C++ and new version in D language), I made my changes in D version.

Alper
  • 12,860
  • 2
  • 31
  • 41
  • What about a loop variable? Like for (x...) f("Hello world" + x); – vz0 Jul 14 '14 at 14:10
  • Add one more sequence with tk!"identifier" instead of tk!"number" and it works. However, type of the identifier can be any type. There is no specific check for int. – Alper Jul 14 '14 at 14:53
  • This looks very promosing. However I was going for the more general case of detecting every expression of operator + with const char* and int, therefore what I want is type inference. I am not using a lint in my project, but just cppcheck. – vz0 Jul 21 '14 at 08:57
3

I'm not familiar with a lot of tools which can do that, but I think grep can be helpful in some measure.

In the root directory of your source code, try:

grep -rn '".\+"\s*+\s*' .

, which can find out all the files which containt a line like "xxxxx" +, hope this can help you find all the lines you need.

If all the integers are constant, you can alter the grep experssion as:

grep -rn '".\+"\s*+\s*[0-9]*' .

And you can also include the ( before the string constant:

grep -rn '(".\+"\s*+\s*[0-9]*' .

This may be not the "correct" answer, but I hope this can help you.

nicky_zs
  • 3,633
  • 1
  • 18
  • 26
  • Running a grep on this project raises a los of false positives. Thanks! – vz0 Jul 21 '14 at 08:58
  • @vz0, of course, the best way to do this is to perform static syntax analysis using something like lint. However, using grep is just a way to find the point of problems most quickly. – nicky_zs Jul 21 '14 at 09:14
  • Check my accepted answer. A lint can't do type inference. – vz0 Jul 21 '14 at 10:24
  • @vz0, I see your answer but I don't understand that if your answer works, why would grep raise a lot of false positives, since grep works just like your find-and-replace and strings quoted by double-quotes are very easy to grep out? – nicky_zs Jul 21 '14 at 16:03
  • your solution does not work in the general case of using the operator + with a const char* and an int, or vice versa, int and const char*. The case of «"text" + i» is just an example of one of that general expressions. By "false positive" I mean all those valid cases where the concatenations of an string with another string variable, for example: "Hello " + n, where n is an std::string. – vz0 Jul 22 '14 at 08:53
  • @vz0, well the last grep statement in my answer can avoid the false positive "Hello" + "World". And also, I believe that your answer is more of less equivalent to my answer theoretically because replacing "Hello" with std::string("hello") is equivalent to finding all "Hello"s using grep. – nicky_zs Jul 22 '14 at 13:21
  • No, they are not, because with your solution I have to check manually, one be one, if the expression `"str" + something` is a valid expression, while with my solution I am letting the compiler decide and complain if I'm adding an integer to an string. You don't have to trust me, go and write a simple program with both approaches. – vz0 Jul 22 '14 at 14:46
  • @vz0, of course your solution makes the compiler to check whether the expression is valid and of course the compiler won't be wrong. I just mean that, if all the integers in "hello" + xx expression are constants, then using grep is equivalent to compiler checking and replace string literal with std::string is equivalent to greping A regular expression. Using grep is just a simplest way to obtain as much information as possible. – nicky_zs Jul 22 '14 at 14:58
  • @vz0, also, grep works without type inference, if the integers are not all constants, eg. "hello" + i, where i is an integer variable, then grep will be of no use. – nicky_zs Jul 22 '14 at 15:15
2

You may not need an external tool. Instead, you can take advantage of C++ one-user-defined-conversion rule. Basically, you need to change the argument of your f function from const char*/std::string to a type, that is implicitly convertible only from either a string literal (const char[size]) or an std::string instance (what you get when you add std::to_string in the expression).

#include <string>
#include <iostream>

struct string_proxy
{
    std::string value;

    string_proxy(const std::string& value) : value(value) {}

    string_proxy(std::string&& value) : value(std::move(value)) {}

    template <size_t size>
    string_proxy(const char (&str)[size]) : value(str) {}
};

void f(string_proxy proxy)
{
    std::cout << proxy.value << std::endl;
}

int main()
{
    f("this works"); // const char[size]
    f("this works too: " + std::to_string(10)); //  std::string
    f("compile error!" + 10); // const char*
    return 0;
}

Note that this is not going to work on MSVC, at least not in 2012 version; it's likely a bug, since there are no warning emitted either. It works perfectly fine in g++ and clang (you can quickly check it here).

gwiazdorrr
  • 6,181
  • 2
  • 27
  • 36
  • This sounds promosing. However I have 75k lines of code with thousand functions... – vz0 Jul 14 '14 at 18:00
  • In all honesty, 75k lines is not that many. An alternative solution is to derive from `std::string`, make construction from `const char*` `explicit` and add array reference constructor. Both solution have an obvious benefit of being compile-type enforced. – gwiazdorrr Jul 14 '14 at 23:49
  • It's for my pet project, don't have much free time. – vz0 Jul 15 '14 at 09:55
  • 1
    You *will have to* dedicate some time to either set up and maintain a lint tool (an ongoing process) or to fail-proof the code, as above. – gwiazdorrr Jul 15 '14 at 10:29
  • You can at least partially automate the conversion. Just compile and pipe the errors to a file. Use the line numbers in the file to auto-convert your code using the scripting language of your choice (sed, awk, perl, php, python, etc.). – Dwayne Towell Jul 19 '14 at 18:28
2

I've found a very simple way to detect this issue. Regular expression nor a lint won't match more complex expressions like the following:

f("Hello " + g(i));

What I need is to somehow do type inference, so I'm letting the compiler to do it. Using an std::string instead of a literal string raises an error, so I wrote a simple source code converter to translate all the string literals to the wrapped std::string version, like this:

f(std::string("Hello ") + g(i));

Then, after recompiling the project, I'd see all the errors. The source code is on GitHub, in 48 lines of Python code:

https://gist.github.com/alejolp/3a700e1730e0328c68de

vz0
  • 32,345
  • 7
  • 44
  • 77
  • Pragmatic and original! And it works, provided nobody overloaded operator+ to give meaning to the addition of a string with an integer. – Iwillnotexist Idonotexist Jul 17 '14 at 02:27
  • 1
    @IwillnotexistIdonotexist Yes, since this is a migration fron a non-oop source code, the structure of the code is very basic. We don't have classes with virtual methods, nor any operator overloads. – vz0 Jul 17 '14 at 10:11
  • Good idea, but you have the same problem as nicky_zs: parsing strings right is difficult. Your code breaks with `"\\"`, or with multiline strings, or with C++11 raw strings. I'd let the compiler worry about it, and go either for Alper's or my solution. – esneider Jul 18 '14 at 20:10
  • @esneider It also break with implicit concatenation of literal strings: "aa" "bb", but I don't have any of those. I'm just using this script to detect the errors, then fixing them on the original source code. Also, a lint won't detect complex expressions like the first example, since a lint can't do type inference. – vz0 Jul 18 '14 at 22:26
0

If your case is exactly as

"some text in quotations" + a_numeric_variable_or_constant

then Powergrep or similar programs will let you to scan all files for

("[^"]+")\s*\+\s*(\w+)

and replace with

\1 + std::to_string(\2)

This will bring the possible matches to you but i strongly recommend first preview what you are replacing. Because this will also replace the string variables.

Regular expressions cannot understand the semantics of your code so they cannot be sure that if they are integers. For that you need a program with a parser like CDT or static code analyzers. But unfortunately i do not know any that can do that. So to sum i hope regex helps :)

PS: For the worst case if the variables are not numeric then compiler will give you error because to_string function doesn't accept anything than numeric values. May be later then you can manually replace only them which i can only hope won't be more.

PS 2: Some may think that Powergrep is expensive. You can use trial for 15 day with full functionality.

ifyalciner
  • 1,190
  • 1
  • 10
  • 22
0

You can have a try at the Map-Reduce Clang plugin. The tool was developped at Google to do just this kind of refactoring, mixing strong type-checking and regexp.

(see video presentation here ).

Nielk
  • 760
  • 1
  • 6
  • 22
0

You can use C++ typecasting operator & create a new class which can overload the operator + to your need. You can replace the int to new class "Integer" & perform the required overloading. This requires no changes or word replacing in the main function invocation.

class Integer{
    long  i;
    std::string formatted;
public:
     Integer(int i){i = i;}
     operator char*(){
        return (char*)formatted.c_str();}
     friend Integer operator +( char* input, Integer t);
};
Integer operator +( char* input, Integer integer) {
    integer.formatted = input + std::to_string(integer.i);
    return integer;
}
Integer i = ....
f("test" + i); //executes the overloaded operator
dvasanth
  • 1,337
  • 1
  • 9
  • 10
0

i'm assuming for function f(some_str + i); your definition should be like this

 void f(std::string value)
 {
    // do something.
 }

if you declare some other class like AdvString to implement Operator + for intergers. if your declare your function like this below code. it will work like this implementation f(some_str + i);

 void f(AdvString value)
 {
   // do something.
 }

sample implementation is here https://github.com/prasaathviki/advstring

Prasaathviki
  • 1,147
  • 2
  • 11
  • 22
  • Thank you. As I stated earlier, the f function is just an example of one use case. In general I have many functions and many expressions in many contexts, so this solution would only work for one single case. If you use an auxiliary variavble std::string s = "hello" + i; your modification won't detect the issue. – vz0 Jul 21 '14 at 08:55