0

I am a newbie in regex, I try to write a C# application that replace all of old cast style of c++ file to static_cast style.

For example:

(void)memcpy(&a[0],(void * )hihi, (UBYTE) V_SIZE);
(void) memcpy((VOID*)abc, (const VOID*) dafa, (uint8)NUMBER_SIZE);
(void )memcpy(
        (void *)p,
        &abc, (uint8)DATE_SIZE);

I expect all of them should be

static_cast<void>(memcpy(&a[0], static_cast<void * >(hihi), static_cast<UBYTE>( V_SIZE)));
static_cast<void> memcpy(static_cast<VOID*>(abc), static_cast<const VOID*> (hihi), static_cast<uint8>(NUMBER_SIZE));
static_cast<void >memcpy(
        static_cast<void *>(p),
        &abc, static_cast<uint8>(DATE_SIZE));

I also investigate and try with this one

List<string> castTypeList = new List<string>
{"void", "void *", "UBYTE", "uint8", "VOID*", "uint8" };

// Fix qacpp-4.7.0-3080 This expression uses old style casts
// Only apply for cpp source file (.cpp)
if ((Path.GetExtension(sourceFile) == ".cpp"))
{
    foreach (string e in castTypeList)
    {
        File.WriteAllText(sourceFile, Regex.Replace(
            File.ReadAllText(sourceFile),
            @"\(" + e + @"\)([^):;\r\n},]*)",
            "static_cast<" + e + @">($1)"));
    }
}

The result look good, but not perfect, some string can't match and replace (You can see below picture). Is there any better solution or better idea to handle it? enter image description here

Bruce
  • 519
  • 6
  • 23

3 Answers3

0

Reliably matching old-style casts is impossible with a regex, because you can't tell for sure what's a type and what's not. As proof, consider this C++ file:

#include "somefile.h"

void f(sometype_t x) {
    g((something)*x);
    h((somethingelse)(x));
}

If something is a type, then the line with it is a cast, but if it's a variable, then it isn't. Similarly, if somethingelse is a type, then the line with it is a cast, but if it's a function, then it isn't. Further reading: https://en.wikipedia.org/wiki/Lexer_hack

To really hammer the point home, consider this other C++ file:

void g(long);
void h(char *);

template<int N>
struct SomeStruct {
    typedef int *sometype_t;
    typedef short something;
    typedef char *somethingelse;
};

template<>
struct SomeStruct<42> {
    typedef int sometype_t;
    short something;
    char *somethingelse(sometype_t);
};

constexpr int a = 12300 + 45;
constexpr int b = 40 + 2;

struct Child1 : SomeStruct<a> {
    void f(sometype_t x) {
        g((something)*x);
        h((somethingelse)(x));
    }
};

struct Child2 : SomeStruct<b> {
    void f(sometype_t x) {
        g((something)*x);
        h((somethingelse)(x));
    }
};

Now it should be clear that the only way to know whether or not those things are casts is by first evaluating arbitrary constexpr expressions, which are Turing-complete.

  • yes, i think so, but is there anyway better to do it, i am not expect 100% perfect but 90-95% is fine, user need to check all changes again and fix by manual – Bruce Jun 02 '22 at 05:07
0

You can match and replace with the following pattern until no match is found:

(?i)\(\s*((?:const\s+)?(?:u?(?:byte|int)\d*|void)(?:\s*\*)?)\s*\)\s*(\w+(?:\((?>[^()]+|(?<c>)\(|(?<-c>)\))*\))?)

See the regex demo.

In C#, you can use

var text = @"(void)memcpy(&a[0],(void * )hihi, (UBYTE) V_SIZE);\n(void) memcpy((VOID*)abc, (const VOID*) dafa, (uint8)NUMBER_SIZE);\n(void )memcpy(\n        (void *)p,\n        &abc, (uint8)DATE_SIZE)";
var pattern = @"(?i)\(\s*((?:const\s+)?(?:u?(?:byte|int)\d*|void)(?:\s*\*)?)\s*\)\s*(\w+(?:\((?>[^()]+|(?<c>)\(|(?<-c>)\))*\))?)";
var tmp = string.Empty;
do
{
    tmp = text;
    text = Regex.Replace(text, pattern, "static_cast<$1>($2)");
}
while (text != tmp);
Console.WriteLine(text);

See the C# demo. Output:

static_cast<void>(memcpy(&a[0],static_cast<void *>(hihi), static_cast<UBYTE>(V_SIZE)));
static_cast<void>(memcpy(static_cast<VOID*>(abc), static_cast<const VOID*>(dafa), static_cast<uint8>(NUMBER_SIZE)));
static_cast<void>(memcpy(
        static_cast<void *>(p),
        &abc, static_cast<uint8>(DATE_SIZE)));
Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
0

I have finally found the solution for myself.

List<string> part_one_CastTypeList = new List<string>
{"void", "void "};

List<string> part_two_CastTypeList = new List<string>
{"void * ", "UBYTE", "VOID*", "const VOID*", "uint8", "void *"};


if ((Path.GetExtension(sourceFile) == ".cpp"))
{
    foreach (string e in part_one_CastTypeList)
    {
        string castType = null;
        foreach (char c in e)
        {
            castType = castType + '[' + c + ']';
        }
        // Match multi pattern, variant is inside function (Applicable for void type)
        // i.e (void)memcpy((FLOAT)a, (void *) b, (uint8)c)
        // ==> static_cast<void>memcpy((FLOAT)a, (void *) b, (uint8)c)
        File.WriteAllText(sourceFile, Regex.Replace(
            File.ReadAllText(sourceFile),
            @"\(" + castType + @"\)([^;\r\n}]*)(\);)",
            "static_cast<" + e + @">($1)$2"));
    }
    foreach (string e in part_two_CastTypeList)
    {
        string castType = null;
        foreach(char c in e)
        {
            castType = castType + '[' + c + ']';
        }
        // Match single pattern, variant is not inside function (Not applicable for void type)
        // i.e (uint8)0x00,(uint16)0x01, (SW) 0x02
        // ==> static_cast<uint8>(0x00),static_cast<uint16>(0x01), static_cast<SW>(0x02),
        File.WriteAllText(sourceFile, Regex.Replace(
            File.ReadAllText(sourceFile),
            @"\(" + castType + @"\)([^:\/;)\r\n},]*)",
            "static_cast<" + e + @">($1)"));
    }
}

I will separate regex to 2 part. Part 1 (first for loop) will match the type cast of the function. After first for loop completed, all cast type of the function should be replaced like below

static_cast<void>(memcpy(&a[0],(void * )hihi, (UBYTE) V_SIZE));
static_cast<void>( memcpy((VOID*)abc, (const VOID*) dafa, (uint8)NUMBER_SIZE));
static_cast<void >(memcpy(
        (void *)p,
        &abc, (uint8)DATE_SIZE));

part 2 (second for loop) will match the type cast of the variant and replace all of the rest like below

static_cast<void>(memcpy(&a[0],static_cast<void * >(hihi), static_cast<UBYTE>( V_SIZE)));
static_cast<void>( memcpy(static_cast<VOID*>(abc), static_cast<const VOID*>( dafa), static_cast<uint8>(NUMBER_SIZE)));
static_cast<void >(memcpy(
        static_cast<void *>(p),
        &abc, static_cast<uint8>(DATE_SIZE)));

It's not perfect 100% but I try with many source files and it look good, it can match up to 99% :))

Bruce
  • 519
  • 6
  • 23