-1

so let's say I have a main function with some arbitrary code:

void main(){
    //Some random code    
    int a = 5;
    int b = a + 7;
}

and the text of this function is stored inside an std::string:

std::string mystring("void main(){ //Some random code  int a = 5; int b = a + 7;}");

I want to use std::regex in order to extract out the body of the function. So the result I would be getting back is:

"//Some random code int a= 5; int b = a + 7;"

My issue is I do not know how to format the regular expression to get what I want. Here is my code I have right now:

std::string text("void main(){ //Some random code  int a = 5; int b = a + 7;}");
std::regex expr ("void main()\\{(.*?)\\}");

std::smatch matches;

if (std::regex_match(text, matches, expr)) {
    for (int i = 1; i < matches.size(); i++) {
        std::string match (matches[i].first, matches[i].second);
        std::cout << "matches[" << i << "] = " << match << std::endl;
    }
}

My regex is completely off and returns no matches. What do I need to make my regex in order for this to work?

Sunny724
  • 156
  • 2
  • 12
  • 4
    Unrelated to your (current) problem, but using regular expressions to try an parse C or C++ like code is going to be hard, since regular expressions can't really handle nested structures. For something simple like your example it will work, but once you mix in nested braces you can no longer use regular expressions reliably. – Some programmer dude Feb 26 '16 at 07:35
  • I see - so the "main" function can have any number of nested if/else statements, while loops, etc...in that case regex will not work then? If not, then what would be a better way to get extract the data? I'm not tied to using regex, I just figured it would be the best way. – Sunny724 Feb 26 '16 at 07:38
  • 1
    Related: http://stackoverflow.com/a/1732454/3410396 – Revolver_Ocelot Feb 26 '16 at 07:41
  • What do you plan to do once you have found your function? Do you just want a list of names of functions [for example], or do you have plans to "parse" the C [or C++] that is in the function? For the latter, libclang is probably a good choice. – Mats Petersson Feb 26 '16 at 07:54
  • @cardinal724 Do you just want the text between `void main(){` and `}` without taking care about any scopes inside the function body? Please describe more what you are trying to do. Parsing `C`/`C++` or just extracting some part out of a string that "looks" like a function? – Simon Kraemer Feb 26 '16 at 08:07
  • @cardinal724 And by the way: You might want to escape the parentesis in your regex `std::regex expr("void main\\(\\)\\{(.*?)\\}");` and change the type of `i` from `int` to `size_t`. The regex matches for me when escaping the `()` part – Simon Kraemer Feb 26 '16 at 08:09
  • @SimonKraemer Yes I would just like to extract the text inside the function body, regardless of what that text is. It just so happens that it will always be in the form of void main(){ 'stuff' }. – Sunny724 Feb 26 '16 at 08:15
  • @cardinal724 If it's always in that form you don't need regexes, you can just grab the stuff between the first `'{'` and the last `'}'`. That is, two `std::find`s, one in reverse. – molbdnilo Feb 26 '16 at 08:46

2 Answers2

0

As discussed in the comments OP only wants to "extract the text inside the function body, regardless of what that text is".

@OP: Your regex is wrong as you don't escape the parenthesis for main(). Changing the regex to "void main\\(\\)\\{(.*?)\\}" will work.

I also recommend to use size_t for i in your for-loop so you don't compare signed with unsigned (std::smatch::size() returns size_t).

#include <iostream>
#include <regex>

int main()
{
    std::string text("void main(){ //Some random code  int a = 5; int b = a + 7;}");
    std::regex expr("void main\\(\\)\\{(.*?)\\}");

    std::smatch matches;

    if (std::regex_match(text, matches, expr)) {
        for (size_t i = 1; i < matches.size(); i++) {
            std::string match(matches[i].first, matches[i].second);
            std::cout << "matches[" << i << "] = " << match << std::endl;
        }
    }
}

Output:

matches[1] =  //Some random code  int a = 5; int b = a + 7;

This solution fails for the input "void main(){ while(true){ //Some random code int a = 5; int b = a + 7; } }"

The easiest solution to this would be to change the regex to "^void main\\(\\)\\{(.*?)\\}$" but that requires the input to start with "void main(){" and end with "}"

As proposed by Revolver_Ocelot you can also add some whitespace matching into the regex to make it a little bit more flexible.

Simon Kraemer
  • 5,700
  • 1
  • 19
  • 49
  • Thank you for your response! Unfortunately this approach doesn't work if there is a "}" in the body text before the final "}" deliminator. – Sunny724 Feb 26 '16 at 08:23
  • @cardinal724 Does the string you prove always end with `}`? Can you give me some more examples? – Simon Kraemer Feb 26 '16 at 08:28
  • sure - this string fails: "void main(){ while(true){ //Some random code int a = 5; int b = a + 7; } }" – Sunny724 Feb 26 '16 at 08:31
  • @cardinal724 Anchor last `}` to the end of input. For robustness you can add optional whitespace characters between `}` and the end – Revolver_Ocelot Feb 26 '16 at 08:43
0

As suggested in your use case it would probably be the best to just rely on string search and matching of braces.

#include <iostream>
#include <regex>


std::string getBody(const std::string& functionDef, const std::string& text)
{
    size_t pos = 0;
    do
    {
        if ((pos = text.find(functionDef, pos)) == std::string::npos)
            continue;

        pos += functionDef.length();

        size_t firstSemicolon = text.find(";", pos);
        size_t firstOpen = text.find("{", pos);
        size_t firstClose = text.find("}", pos);

        if (firstSemicolon != std::string::npos && firstSemicolon < firstOpen) //Only function declaration
            continue;

        if (firstOpen == std::string::npos || firstClose == std::string::npos || firstClose < firstOpen) //Mismatch
            continue;

        size_t bodyStart = pos = firstOpen + 1;
        size_t bracesCount = 1;
        do
        {
            firstOpen = text.find("{", pos);
            firstClose = text.find("}", pos);

            if (firstOpen == std::string::npos && firstClose == std::string::npos)//Mismatch
            {
                pos = std::string::npos;
                continue;
            }

            //npos is always larger
            if (firstOpen < firstClose)
            {
                bracesCount++;
                pos = firstOpen + 1;
            }
            else if (firstOpen > firstClose)
            {
                bracesCount--;
                if (bracesCount == 0)
                {
                    size_t bodySize = firstClose - bodyStart;
                    return text.substr(bodyStart, bodySize);
                }
                pos = firstClose + 1;
            }
            else
            {
                //Something went terribly wrong...
                pos = std::string::npos;
                continue;
            }

        } while (pos != std::string::npos);
    }
    while (pos != std::string::npos);
    return std::string();
}

int main()
{
    std::string text("void main(); int test(); void main(){ while(true){ //Some {random} code int a = 5; int b = a + 7; } } int test(){ return hello; } ");
    std::cout << getBody("void main()", text) << std::endl;
    std::cout << getBody("int test()", text) << std::endl;
}

Output:

 while(true){ //Some {random} code int a = 5; int b = a + 7; }
 return hello;

The code can also handle newlines and skips function declarations. I tried to write it as clear as possible.

If there are still questions feel free to ask.

Simon Kraemer
  • 5,700
  • 1
  • 19
  • 49